The Official Journal of the Institute of 
Mathematical Statistics 


Volume IX 


1938 





tie annals 
3MATICAL STATISTICS 

EDITED BY 

&.&' WILKS, JSdito?' 

EG J- NEYMAN 

rH THE COOI^RATION OF 

K, A. I" IS TEE IT, H- DB IVf TH1CS 

X. C. Fry E. S. Pearson 

H. Hotelling II. Ij. 

W. A. StiEWHART 


>it in the Annals of Mathematical Statistics 
, Line Hall, Princeton, New Jersey. Manuscripts 
^-spaced with wide margins, and the original copy 
otes should be reduced to a minimurxi and whenever 
graphy at the end of the paper; formulae in foot- 
igures, charts, and diagrams should be drawn on 
cloth in black India ink twice the fuze they are to 
jucsted to keep in mind typographical difficulties 
formulae. 

ceive only galley proofs. Pifty reprints without 
Additional reprints and covers furnished at cost. 

the Annals is $4.00 per year. Single copies $1.25. 
,t the following rates; 

ngle numbers $1.50. 

Single numbers $1.25. 

refers for back numbers and other business com- 
> A. X. Craig, University of Iowa, Ic>wa City, Iowa. 

vtioal Statistics is published quarterly by the 
statistics. 


TOSBD AND PRINTED AT THE 

V AVERLY PRESS, Inc. 
ALTIMORE, Mo., XT. S. A. 



CONTENTS OF VOLUME IX 

Bacon, H. M. Note on a Formula for the Multiple Correlation Coefficient. 227 
Bridger, Clyde A. Note on Regression Functions in the Case of Three 

Second-Order Random Variables....... 309 

Camp, Burton H. Notes on the Distribution of the Geometric Mean. 221 

Cbaig, Allen T. On the Independence of Certain Estimates of Variance . 48 

DeLury, D. Note on Correlations,.. 149 

Dodd, Edward L. Interior and Exterior Means Obtained by the Method 

of Moments.. 153 

Dwyer, Paul S. Combined Expansions of Products of Symmetric Power 
Sums and of Sums of Symmetric Power Products with Application to 

Sampling.....,.,.. 1 

Dwyer, Paul S. Combined Expansions of Products of Symmetric Power 
Sums and of Sums of Symmetric Power Products with Application to 


Sampling (Continued)....... 97 

Dwyer, Paul S. The Computation of Moments with the Use of Cumu¬ 
lative Totals. 288 

Frankel, Lester R. See Hotelling, H. 

Geiringer, II. On the Probability Theory of Arbitrarily Linked Events. 260 
Greenwood, Joseph A. Variance of a General Matching Problem...... 56 


Guttman, Louis, A Note on the Derivation of a Formula for Multiple 


and Partial Correlation.... 305 

Hoel, Paul G. On the Chi-square Distribution for Small Samples. 158 

Hotelling, II., and Frankel, Lester R. The Transformation of 

Statistics to Simplify their Distribution.... 87 

Hsu, P. L. Notes on Hotelling’s Generalized T.,...... 231 

Keyfitz, Nathan. Graduation by a Truncated Normal.,. 66 

Neyman, J. Tests of Statistical Hypotheses which are Unbiased in the 

Limit......... 69 

Norris, Nilan. Some Efficient Measures of Relative Dispersion........ 214 


Olds, E, G, Distributions of Sums of Squares of Rank Differences for 

Small Numbers of Individuals......... 133 

Olshen, C. A. Transformations of the Pearson Type III Distribution.. 176 
Starkey, Daisy M. A Test of the Significance of the Difference between 
Means of Samples from Two Normal Populations without Assuming 
Equal Variances....201 











iv CONTENTS OF VOLUME IX, 

Thompson, William R. Biological Applications of Normal Range and 
Associated Significance Tests in Ignorance of Original Distribution 

Forms...... 28t 

von Mises, R. A Modification of Bayes’ Problem... 256 

Wald, A. A Generalization of Markoff’s Inequality.— 244 

Wilks, S. S. The Large-sample Distribution of the Likelihood Ratio 

for Testing Composite Hypotheses.. 60 

Wilks, S. S. Shortest Average Confidence Intervals from Large Samples. 166 

Wilks, S. S. Fiducial Distributions in Fiducial Inference. 272 

Ziaud-Din, M, On Differential Operators Developed by O’Toole. 63 

Report of the Annual Meeting of the Institute of Mathematical Statistics.. 68 












COMBINED EXPANSIONS OF PRODUCTS OF SYMMETRIC POWER 
SUMS AND OF SUMS OF SYMMETRIC POWER PRODUCTS 
WITH APPLICATION TO SAMPLING 1 

By Paul S. Dwyer 

PREFACE 

This article is divided into two parts. Part I has for its title “Combined 
Expansions of Products of Symmetric Power Sums and of Sums of Symmetric 
Power Products” and develops the general mathematical theory which is ap¬ 
plied in Part II to “The Fundamentals of Sampling.” Part II will appear in 
a latter issue of this journal. 

Each part is treated as an organic unit and has its own introduction and 
bibliography. Each article is assigned a given number and each book is given a 
letter so that references can be indicated concisely in the body of the dissertation. 

Each part is divided into chapters and sections. Braces are used to indicate 
the important formulas. 

PART I. COMBINED EXPANSIONS OF PRODUCTS OF SYMMETRIC 
POWER SUMS AND OF SUMS OF SYMMETRIC POWER PRODUCTS 

Introduction 

The mathematical material which is presented here has proved useful in 
generalizing that portion of the fundamental theory of sampling in which 
relations are established between the moments of the sample and the moments 
of the parent population. It is the purpose to establish the theorems in algebraic 
form since they constitute an extension of partition and symmetric function 
theory and may be of value to someone not necessarily interested in sampling. 

A great deal of work has been done in symmetric function theory but not 
much of this is of present value to the statistician. His problem deals with 
the “power sum” while the classical theory, for the most part, deals with the 
interrelations of elementary symmetric functions and monomial symmetric 
functions. Only one phase of the reasoning developed in this investigation 
seems to have received extensive consideration previously and that is the subject 
covered in Chapter III. 

Previous authors have noted that much of symmetric function theory re¬ 
duces, with a proper choice of notation, to partition theory. It is the plan of 
this treatise to present in Chapter I an outline of new partition theory which 


1 A dissertation submitted in partial fulfillment of the requirements for the degree of 
Doctor of Philosophy in the University of Michigan. 

1 



2 


PAUL 8. DWYER 


shows how the parts of one partition are combined to form the parts of another 
partition, and which serves as a means of expressing the main result of Chapters 
II, III, IV, V. 

Chapter II shows how the formulas of Chapter I are applicable to the problem 
of finding products of power sums. The multiplication theorem for power 
sums, a generalization of the multinomial theorem, is stated in terms of power 
product sums and appropriate special cases are indicated. 

Chapter III deals with the expansion of power product sums in terms of 
power sums and shows how the formulas of Chapter I may be used. 

Chapter IV is the key chapter of the paper. The problem is to expand 
products of power sums in terms of power product sums, to multiply each 
power product sum by a quantity which is a uniquely defined function of the 
quantities composing the power product sum, and then to expand back in 
terms of all possible power sums. It is shown that the results can be written 
in a compact form which also utilizes the results of Chapter I. This result, 
as is shown in Part II, is directly applicable to the sampling problem of finding 
the moments of the sample moments in terms of the moments of the universe. 
Extension is made to multivariate distributions in Chapter V. 

Chapter I, The Combination of the Parts of Partitions 

It is the purpose of this chapter to provide a precise notation which shows 
how the parte of one partition of r may be combined to form the parts of another 
partition of r. For example, 2111, a four part partition of 5, can be made 
into 32, a two part partition of 5, by combining the three unit parts into a new 
part or by combining the 2 with one of the unit parts to form the 3 and the 
other two unit parts to form the 2. This last formation can be made in three 
different ways since anyone of the unit parts might be combined with the 2. 
The combination of the parts of the partition 2111 to form the parte of the 
partition 32 is to be indicated symbolically by P$i + 3 P 22 where the subscripts 
indicate the number of parte collected and the coefficients indicate the number 
of ways in which an equivalent collection can be made. 

1. Definitions and Notation, a. Partition [(7; 105] [K' t I; 1] [16; 105]. We 
consider the integer r to be composed of r unit indistinguishable parcels and 
define the partitions of r to be all those different groupings into new parcels, 
each new parcel containing one or more unit parcels, such that each resultant 
grouping of parcels contains exactly all the original r unit parcels. For example 
the partitions of 4 are 

4 ; 31 ; 22 ; 211 ; 1111 

b. Parts of Partitions. The numbers of the grouped unit parcels indicate 
the parts of the partition. Thus the partitions of 4 above 

4 ; 31 ; 22 ; 211 ; 1111 have respectively 

1 ; 2 ; 2 ; 3 ; 4 parts. 



COMBINED EXPANSIONS 


3 


The pattern 22 may also appear as 2 s . In general a p part partition of r is 
to be designated by 

pi pi p 3 • * • p fi where the p’s may or may ngt be equal ancl where pi + P 2 + Pa 
+ ■ * * + Vp “ t or by 


p* L ■ ■ • p B r * where < 


{vx N P2 ^ Pa ^ P4 N • * * Np. 

plfTl + PlK 2 + ' ‘ * + pa7T s = r 
1^1 7T2 4“ • ■ • + T* ** P 


c. Order of Partitions. When the parts of a partition are arranged in de¬ 
scending order we say that the partition is ordered. Thus 


P 1 P 2 • * • p P is ordered if Pi > P 2 > p3 > • * • > p P 

and pf 1 -»• • pV is ordered if pi > pa > p3 > * • * > p*> 

For example, 21 2 is an ordered partition while 312 is not. Unless otherwise 
specified it is hereafter assumed that all partitions are ordered. 

It i$ sometimes convenient to refer to the order of the partition which is 
the size of the largest part, pi, when the partition is ordered. Thus the two 
part partition, 31, is of the order 3, while the four part partition, 1111, is of 
order 1. The set of the numbers pP • •« pP is to be known as the complete 
order. 

These definitions of order and part are consistent with the usual definitions. 
[16; 105-106] [ K ; 7; 1] [(?; 100]. The concept of complete order, as far as I 
know, is not found in the literature. 

d. Weight of Partitions. Isobanc Partitions. The weight of any partition 
is defined to be the sum of all the parts of the partition. Thus the weight of 
pl l * " pP is Vi*i + Vwi + • * * + V * **• “ Partitions having the same weight 
are called isobaric. Thus 4 and 211 are isobaric partitions. 

e. Algebraic Partitions . If the r original units are composed of a L , a«., ag, 
■ fl, nonseparable primary units, then the result of combining these in any 
possible way is to be called an algebraic partition since the r original units 
are now replaced by the r algebraic quantities ai , a% , • • • a r . Thus a t , o 2 , a 3 
may be combined to form 


gei + 02 ~b J 4” ‘ ; ai + &3 ■ 0 * 2 ; a% + #3 ■ cu; cl\ ■ ag - a% 

which are the algebraic partitions of ai + Oa + 0s ■ 

The parts of the algebraic partitions are the resulting combinations while 
the order and complete order, which indicate the numbers of algebraic expres¬ 
sions combined, agree with the order and complete order of the partitions in 
which the a’s are unity. The weight, which is equal to the sum of the parts, 

is indicated by w = ai -f a 2 + ■ • ■ + a r . Thus if a\ = 5, a 2 - 4, and a 3 = 3, 

w — 12. It is to be noted that the algebraic partitions are formed by combining 
the parts 5, 4, 3 and not by combining all parts of 12, 



PAUL a. DWTER 


Now ox fla * is itself a partition of weight v) = Ci + a$ + - * + o r . 
If groups of the o J s are alike it may be written 

a? 1 a? 2 • • ■ a?* where 

d\OL\ -f- & 2 GC 2 4" * * * 4"" tth&b ~ ^ 

ai -f- «2 * * * *4* a /i ** 


Algebraio partitions having the same weight are called isobaric, 


Or 


Tl ) indicate the number of 
P« 7 


f. Partition Combination Notation , Let 
different ways the r units, ordinary or algebraic, can be collected to form the 
partition. Thus indicates the number of ways in which the five units 


can be collected to form a partition with three units in one part and two units 
in the other* Since the three units forming the first part can be selected in 
sCs ways and since this selection automatically indicates the other two units 
forming the second part, it follows that 



B c 3 _ ,Cz - 10. 


It is to be noted that 





for if the four unit parts are a x , 0 $, a x , a 4 , then the three 22 partitions are 
o>\ 4* 4* $4j a>\ + <xa*u2 4- 0*4# tii *4" ^4 *4“ 03 

since 

&3 4 “ ’ cti 4 “ j ctj *4* di • ax “b 03; ct2 4 ^ 03 * cn 4- $4 

are essentially the same groupings as the first three indicated. 


2. Formula for ^ In establishing this formula we first take the 

case in which no part is repeated* i.e. m = t* •. • n - 1 and 4 - p 2 4 - p* 
• * ■ + p» — r. In this case the formula becomes 



This results from the fact that the p x units can be grouped in r C Pl different ways* 
The p% units then in r „ Pl C PJ different ways, the y> 3 units then in r ^ x ^. n C v% dif¬ 
ferent ways etc. So that 


l r 


\vm 


~rC f 


' r—: 


pi r— 1 r-pj— 


f-Pj-pj * • ‘Pj 


-A. 


-]^f7r7^'i Compare [B; 49][19; 12] 



COMBINED EXPANSIONS 


5 


If however p x ~ p% = 
different times since pi, p s 
so that 


By similar reasoning 



* = p 3 , then the same partition has been used s i 
, . • •, p a may be interchanged in $\ different ways, 



_r!__ 

WT Vd tj W * • • T.i 


Compare [10; 12, 13] [I; II, 252] 


( Ct\ a«r ( 4 # n ah \ 

1 rt i 9 * * * t 4 ). The number of ways in which the r parts of 
Pi Pi " Pa / 

a? 1 a? 2 • • ■ al h may be collected to form the p parts of p* 1 *. ■ pi* may be indi¬ 
cated by 


(cl ? 1 a ? 1 
W pV 



Thus 



and 



- 3 


Formulas useful is evaluating this expression can be worked out from the results 
of this paper. A table of values of this expression for w < 8 has been given 
by the author [19; 29-32]. 


4. Notation for Combining the Parts of a Partition. Table I. We wish 
to indicate not only the number of ways in which a given r part partition of 
weight w can be grouped to form a p part partition of weight v> ] but also the 
number of parts of the r part partition grouped to form each of the p parts 
of the p part partition. As indicated in the opening paragraph, -f 3?2& 

i purpose for the case in which the parts 2111 are col- 

a ? 1 a? .. • a? 

pPpP - pl' 

case. Its expansion gives sums of P functions whose subscripts are the numbers 
of parts combined and whose coefficients are the number of ways of forming 
the partitions from the parts. For example 

p(?) - Pi; p( 1 ^) = 3P 2] ; etc. 

( a <*i ‘ a «t ' m m a ah \ 

is so fundamental to the present approach that a 

table is provided showing the different values when w < 6. 

Table I gives values of the function when w = 1, 2, 3, 4, 5, 6. The values 
af 1 ■ • ■ a? h are given in the left hand columns and the values of p* 1 - •. pi* in 
the top row. The partitions are ordered from the top and from the left. To 


serves this purpose in the more general 


- P 




serves this 


ieeted to form 32 


■ p ( 























































8 


PAUL S. DWYER 


/21H\ 

find a given value, say P I™ J we note that to ^ 5, look for 2111 on the left 

and 32 at the top. The result is P n + 3P 22 * In the table the order of the 
subscripts is important in indicating the number of parts collected to form the 
respective parts of the ordered pi x « • * pl\ 

The values in the table previously mentioned [19; 29-32] may be obtained 
when «j g 6 by placing every P in Table I equal to unity. 

5. Value of JP (aV a 2 “ 2 • * • a?*). The parts of the partition a? 1 ap • * • a?* 
may be collected to form a large number of partitions of the type p{ x . • ■ 

Thus the parts of the partition 2111 may be collected to form 5, 41; 32, 311, 
221, 2111, We denote by P(2111) the values 

P(” l ) 5 + P (™) 41 + P (“') 32 + P (““) 311 + P ( 2 “) 221 

+ ^ Qili) 2111 = Pih + 3PSl41 + [PS1 P 3P “ 21 32 + 3P « t311 

+ 3P2u221 + Pmi2111 

and in general 



where the supunation holds for every partition pl l - ■ pi* which can be formed 
by combining parts of a? 1 • • * af\ The values of P(a? 1 • * < al h ) for w g 6 
are given in the rows of Table I. Thus the value of P( 2111 ) above is found 
along the row 2111 where w « 5 . 


6 . Values of P(Y) and P(a r )- When a\ == 1 and ai = r we have 

and since there are f n * r< ] different ways of forming pp . 
the r units and each way is indicated by Ppp.. we have 

P (1 ) = Yj ^pTl _ ■ ■ • pl‘- 

When r = 2, 3, 4, etc., we get 


p'“ from 


13 } 


P(l 2 ) = P 2 2 + Pn l 2 

F(l 3 ) =P,3+ 3 P n 21 + P m l 3 

P(l 4 ) = P< 4 + 4P S1 31 + 3P 22 22 + 6P*u 211 + P ml l 4 

etc. 

as indicated in Table I. 



COMBINED EXPANSIONS 


9 


Similarly when 4i =* a and ai = r {2) becomes 

P{a r ) = X ( n r W fa*)'* • • • (ap.)'* (3'1 

\Pl ' * * Va / 

since there are ( Tl 1 r J different ways of forming the partition (api)* 1 (ap 2 ) T2 

\Pi •*•?./ 

.... (ap,)* s from the r equal a J s and each way is indicated by P*p...p;*. 

For example 

P(a) = pQa = P,a 

f(a2) = P ( 2 a) + P (aa) = P * 31 + Pn a “ 

P « - P ( 3 “l) + ? U.) + ? (.ra) - P - 3 “ + 3P ” 2 “'“ + p “'“' 

P(a 4 ) = P 4 4a + 4Pu 3a • a + 3P 22 2a • 2a + 6Pm 2a • a 2 -f- Pim a*. 


7. Values of P(a t oj • ■ • a,). From the definition 


p(ai) = pro «i = Pttfr 


P(oias) - Pf “V?* \ + p( aiQ2 ) 

\on 4 " as/ \ai02/ 

— P 2 Oi + a 2 + Pn &2 

P(aia 2 a 3 ) » p( )(#i + ^ + as) + n VaT+ 

\ai 4- as 4" fl a/ yzi 4* • fls/ 


+p 


+ p 


ai u 2 a 3 


cti 4“ 


-a 2 ) (ai + as ' a2) + P + ar(2l) 


= P ^ ai + + 

VaiasOi/ 


4* P 2 i|(a 1 4" °2 ■ 3 ) 4- (#i 4~ 0s* <* 2 ) 4“ (o 2 +a3'#i)} 4" Piu( fl ta 2 a 3 ) 
etc, 


Now if complete order of the general partition indicates the number of a’s 
collected to form the partition, the subscripts of the P J s are the respective 
complete orders. If we indicate the sum of partitions having the same com¬ 
plete order by the term “partition type” and indicate the partition type com¬ 
posed of all terms having the same complete order 

Pi 1 ■ v*‘ by T p \i.., v *s 



10 


PAUL Bn DTVYUE 


P(aO * Pi Pi 

P(aiOi) — P 2 P 2 + PnTu 

P{aiOzaz) = P s Ts + PnTn + Put Tin 

P(ai^a3a 4 ) » P 4 P 4 + P 31 P 31 + P 22 P 22 + PmTm + PiiuPim 


etc. 

and in genera] 

P ( ai #2 ' ' * &r) X] P ?* 1 ***?, 1 Ppi 1 **^# 4 {^1 

This formula can be used in writing the formula of Table II or formulas of 
weight greater than 6. Thus 

P(543) is given by PsPs + P 21 P 21 + Pm Tin 

where 

Pa - 12, P a - 9.3 + 84 + 7-5, and Tm = 5*4*3 

where the dots do not indicate multiplication, but merely the separation of 
the parts. 

T \ 

^ t J partitions since this is the 

number of ways in which the a’s can be combined to form partitions having 
the same complete order, p! 1 •»♦ pp. 

Formula {3 ; } is a special case of this formula. If the a’s are all equal, the 

partitions are equal so that = ( Tl ^ *,)(api) ri 

{ap 8 ) ri , Substitution in {4} gives {3'}. Similarly (4) gives {3} when 
all the a’s are unity. 



In general T v *i 


is composed of y 


8. Generalization from Symmetry. The function P(ai 02 * * • a r ) is a sym¬ 
metric function of the parts <n, a 2 , * * • a r , i.e., the interchange of any two of 
the pans does not change the value of the function. It is possible to use this 
fact as a basis of generalization and to derive {4} from {3| by its use. From 
{3} we have 



W ^ ere .. jp**) * s num her equivalent partitions which can be 

formed from the r units. In case the r units are replaced by the r different 

a’s, there will result ( n * T| ) different partitions having the same complete 
\n * * * Vi j 



COMBINED EXPANSIONS 


11 


order. These ( t . * different partitions defined by p *t replace the 
XPi ***?W 

equivalent partitions of ( 2 ) send we have 

P{d\(h Or) ~ {4} 

9. The Recursion Rule. It is possible to establish a recursion property by 
which the value of P(aiO* • • • OrOr+i) can be obtained from the value of P(aia 3 
* * * a r ). We note, from the results of Table I or by {4} that 

PC3) - Pi3 
P(32) - ft5 + P u 32 

P(321) - ft6 + ft*51 + fti42 + Pi233 + Puilll 

P(32) is obtained from P( 3 ) by symbolic multiplication of its expansion, Pi(3), 
by the expansion of P( 2 ), Pi( 2 ). This symbolic multiplication is accomplished 
by adding the 2 to the 3 and also suffixing the 2 to the 3. If the 2 is added, the 
subscripts of the P’s are added while if suffixed, the subscripts of the P’s are 
suffixed. 

More generally if P(ai) = Fi(ui) and P{(h) = Pi(a 2 ), then the result P(fliOa) 
= P 2 (ai + a 2 ) + fti(aO (as) is obtained by multiplying Pi(ai) by Ps(aa) [or 
P 2 (a 2 ) by Px(oi)l symbolically if the subscripts are added when the a ? s are 
added and suffixed when the a J s are suffixed. Similarly P(ai 02 ) *» P 2 (ai + Oi) 
+ Pn(ui) (a*) when multiplied by P(a s ) * Pi(ag) gives 

Pi&iQi&s) ** PsGxi. 4~ 4~ ai) 4~ PttCax 4* &s) 4" Psi(ou 4" <h m <h) 

4" Puiaraz 4* us) 4- PuifaiGsas) 

when the rule of multiplication is the adding of a 3 in turn to every part of every 
partition with the appropriate adding of subscripts and the suffixing of a 5 to 
every partition with the corresponding suffixing of subscripts. It is important 
to note that the P c oefficien t of ai»a 2 + a 3 is Pu and not Pn although the term 
could be written P 21 a 2 + * The applications do not demand the retention 

of a given order of subscripts though the continued application of the recursion 
rule does demand it. 

In general the value of P(ai • * • a r a r+ i) can be obtained from the value of 
P(ai 02 - ■ • a r ) by the symbolic multiplication of the expansion P(aid* • • • a f ) 
by Pi(«rw) since all possible algebraic partitions of ai + 02 + »•. a, + o r+ x 
are obtained from all possible algebraic partitions of <t\ 4 - <h 4~ * ■ * 4- a r by 
adding a T+l in turn to each part of each partition and by suffixing it to each 
partition. The corresponding P subscript, indicating the number of a’s col¬ 
lected, is increased by 1 . 

The recursion rule is useful in checking the entries of Table I. As a matter 




12 


PAUL B. DWYER 


of fact Table I was computed with its use and the order of the subscripts is 
that which results from its use. The rule is also useful in finding values when 
w > 6. For example, since 

P(321) “ P 36 + P2i51 ■+* fti42 4 Piz33 4 Pm321 

p(3221) = P 48 4 P 3162 4 P 3 i71 4 P 2 253 4 ftju 512 + P 3 i62 + P 22 44 

+ p2n422 4 p 22 53 + Pi 335 4 iPi 2 i332 4 P 211 521 4- Pi2i341 + ft 12 323 
4 Pjm3212 = PS 4 fti71 42 P 3 i62 + (P 31 4* 2ft 2 )53 4- Pn 44 
4 2P2ii521 4 ftu431 4 Pan422 4 2P*u332 4 P hu3221. 

A useful check is based on the fact that the sum of the P coefficients of 
p( fll < • - Or) should equal the sum of the coefficients of P(l r ). In the above 
illustration the sum of the coefficients is P* 4 4Pai 4 3P 22 4 5ft 11 4 P nn as 
desired. 

10. Use of the P Function Formulas. The P function formulas, as defined, 
represent concisely the ways in which the parts of a given partition may be 
combined to get the parts of other partitions. They are also useful in writing 
expansions of certain partition functions whose expanded values are expressed 
in terms of other partition functions. They are used, in this paper, in expressing 
the multinomial theorem, the multiplication theorem for power sums, the 
expansions of power product sums in terms of power sums, expansions of mono¬ 
mial symmetric functions in terms of power sums, the double expansion theorem 
itself, the coefficients in the double expansion theorem as well as the sampling 
laws of Part II. They are also useful in representing the expansions of different 
moment functions and can be associated with important concepts of mathe¬ 
matics and statistics such as, for example, the differences of 0. Such applica¬ 
tions, however, are not pertinent to the line of reasoning which is developed in 
Chapters II, III, IV, V. 


Chapter n 

It is the purpose of this chapter to obtain formulas for the expansion of 
power sums. 

11. Definitions, a. Power Sum . Let % be a variable which is restricted to 
the N variates, %\, x 2 , , • • • , x #. Then the a-th power sum of the variable 

indicated by (a) is defined to be 

(a) = *1 + + ... • +• a a „ = X (5) 

i—1 

It is assumed for the purposes of this paper that a is a positive integer or 0. 
b. Power Product Sum. The expression £ xVzf is to be called a power 



COMBINED EXPANSIONS 


13 


product sum since it is composed of the sum of products of thp powers of the 
variates. It is to be denoted by (a t * 02 ) or (cu <h). Thus = (3*2) or 

(32). The value (a-a) = (a 2 ) = x°Xj is a special case of ( a where 

i\j 

02 = a L = a. In general the power product sum is defined by the right hand 
member and indicated by the left hand member of 

(aia 2 •.. a,) = Z) sc?lsc?J • • • ®<J {6} 

* ‘Vi r 


If ii « % , the power product sum becomes 

(ai + ^•a 3 -a 4 • ■ • a r ) = S' <$*“!••• {7} 


There are many different definitions since there are many different ways of 
indicating equality relations among the $ s. Each results in a unique power 
product sum which is to be called, for brevity, a power product. If the c’s 
are all unity, there are many duplicates. Thus for the grouping pl l • • • p T t % 


there are ( *, * *,) equal power products (p* 1 • * • In the more general 

\Pi *••?«/ 


case we can let represent the 


Pi 




different power products 


having the same complete order, pi 1 ■ ■ • p?\ We may represent any one of 
these forms having this complete order by 


(<&M« ■■■Ip) 


where + #2 -H ?3 d- * * • 4- & •— w 

or by ($?? 2 5 ••• tf‘)* 

where gi^i + £ 2^2 + • * • ^ qtX t — w 

and 4 * £2 4“ • ♦ • 4~ x% = p. 


c. %m?ne*nc Fimcfoons. Both the power sum and the power product are 
symmetric functions of the variates since the interchange of any x { with any Xj 
does not change the value of the function. Also the powder product having 
p parts is composed of N {p) products of powers since the first group of equal 
$s may be selected in N ways, the next group in N - 1 ways etc. 


d. Monomial Symmetric Function. It is customary to use the monomial 
symmetric function which is defined as 

y 


* “It was intended that the letter representing the exponents of the g's should be the 
Greek 'chi,' and not the English ( x.’” 



14 


PAUL S. DWYER 


and which we designate by M(qi ■ • ■ ?p) or by M(q i 1 • ■ ■ ?t ). This function 
is not useful for our purposes since the number of terms in its expansion varies 
with the number of repeated q’s. For example if JV == 3 and q x N g 2 ; M (gig 2 ) 
== 1 a:®* + aSTd* + x\ l xT + + xVx? = OigO 

while if ji = g* = ff 

M(g s ) - + &?s» = 

The monomial symmetric function keeps the number of product terms a 
minimum by eliminating all repeated terms while the power product sum 
keeps the number of product terms the same by the use of repeated terms, 
when some of the parts are alike. 

12. The Formula Connecting (ql'q? ■■■ qV) and M(qV {*'). The 
power product is composed of JV W products, each of which is repeated xhtl 

N <p ’ 

■.. x i! times. The monomial symmetric function is composed of the — j - —— | 

different products which, when repeated a* !*»!••• x t ! times, gives the N {p] terms 
above. Hence 

(g? • ■ • qV) = *iW • • • Xt\ • • • gf) |8) 

I 9 ' 

In the special case in which Qi « 1 and Xi =* p 

(10 = P \M{l f ) and M(l>) = ^ [10] 

pl 

The function, M{l p ) is commonly called an elementary symmetric function. 
We refer to the corresponding (l p ) as the unitary power product sum. 

13. Correspondence of Partitions and Power Products. To each power 
product (i ql l ,.. qV) there corresponds an algebraic partition q* 1 .. • qV having 
p parts and weight w - a% + '02 + • •« + a r . 

This follows at once from the definitions and notation. Thus if w = a% 
+ (to + a*, the power product 

Z *34*4; = Z = (^+^.o 3 ) 

is, by notation, associated with the partition oT+ a^a 3 . Conversely each 
algebraic partition, when enclosed in parentheses, represents a power product 
rium. 

This proposition is useful in that it enables one to establish a relationship 
between the theory of power product sums and the theory of partitions. Earlier 
writers have used a similar correspondence in relating the theory of monomial 



COMBINED EXPANSIONS 


15 


symmetric functions to that of partitions. See for instance [3; 106], [4; 5] 
[5; I; 7]. 

Due to this correspondence we do not hesitate to apply such terms as part, 
order, complete order, similar, etc. to the power product as well as to the 
algebraic partition. Also the sum of all power products (ql l • * • qV) having 
the same complete order is represented by T(pi l * *. p a r *). This represents 


the sum of 



r 


similar power products. 


14. The Multiplication Theorem for Power Sums. The correspondence 
property enables us to derive a theorem, to be known as the multiplication 
theorem, which expresses products of power sums in terms of power products. 
The type of argument is introduced by establishing simple cases of the theorem 


(<*.)(*)- (g *?•)(]£ *?) 


< 1-1 

< 2-1 


= 2 2 +, 2 = (dj + Os) 4- (aiOs) 


*lrM.M 


= (ai + di + di) + (<*i + + (ai + (ai*d2 + ns) + (<Xi»a2*&s) 


since the value £ is broken into 

MiM.M 

E , . S ., E 

<Hr»a 


2 , 2 
"Ml"*!* 


In general, when r ^ N 

(«i)(as) • • • (a r ) = 2 x1\xV, •••*?; 

MiMi" ’<f 

and this can be broken into summations featuring different equality relations. 
These summations, define all the different power product sums of weight w ~ a\ 
+ <%.+ '■■» + ct r . The different algebraic partitions of cu + + a 3 + • • • + a r 

correspond to the different power product sums. It follows at once that the 
value of (cuXos) (a r ) is obtained by writing each algebraic partition of 
<*1 + 02 + • • • + a T , enclosing it in parentheses to represent a power product, 
and adding. More symbolically we have 


(ai)(a 2 ) • • • (a,.) = 2 (<?*‘ •'' 2*0 (11} 


where q\' « * • q* 1 represents any algebraic partition of a\ + 02 + • • ■ + a r . 
and the summation holds for all such partitions or by 

(ttlXo*) • • • (a.) = 2 T( P P ... prO (12} 

where T(p■ ■ ■ pj‘) represents the ( 1 r ,) similar power products and 

\Pl * ’ * Pa / 

the summation holds for each different complete order. 



16 


PATTL-S. DWYER 


For example ( ai)(a 2 )(az ) — T(3) + T(2l ) + ^0 jjj _ 

andT(3) « (ai + (h + a*), T(2 1) = (oi + cfe-ag) + (a L + a 3 -^) + (a* + ay<h), 
and !T(111) = (araj-aa). 

The theorem has been established on the assumption that r g IV. If such 
is not the case it is possible to satisfy the assumption by adding additional 
variates, a, +1 , , • • • x r , all 0, without changing the value of the power 

sums or of the product of the power sums since the added terms are always 0. 
Thus 

(x$ + 3 s) (*l + x\)(x{ + xl) = (x a i + xl + xS)(xx + x\ + x\)(xl + x\ + *5) 

when xa = 0 

Then 

(a)(6)(e) = (a + 6 4- c) + (a + 6*c) + (a + c»6) + (ft + c-a) + (a-6-c) 


which is 


Z rf*** + E x? b x° + E X? c x b - + E + . E x a ; x)xl 


The term E x‘x)xl = 0 since every product composing it contains an £3 = 0. 

i\j\k 

The other power product sums are to be applied to the original variates only 
since the terms involving xs are 0 in every case. 

In general, if r > N, it is only necessary to write out the power product 
sums having N or less parts since all those having more than N parts will be 0 . 


15. The Multiplication Theorem Using the Results of Chapter I. Com¬ 
parison of (12) with {4} shows that (12} can be obtained from {4} by placing 
P(«H ‘ * *«r)(a0(a 2 ) • * • (a r ), - T(pl l . • * pi') and P p * - 1. 

Since this can be done for all values of a and r it follows at once that the entire 
theory of Chapter I is applicable to the present problem. For example Table I 
shows that 

P(321) = P 3 6 + P 21 51 + P 21 42 + P 12 33 + P m 321 
and it follows that 

(3)(2)(1) = ( 6 ) + (51) + (42) + (33) + (321) 

It should be noted that it is possible to use the table previously published 
119; 29-32] since the entries in this table are the values obtained when 
PpIu.-pP = 1. The value (3)(2)(1) may also be checked from this table. 


16. The Multinomial Theorem. The multinomial theorem is a special case 
of the multiplication theorem for power sums in which the power sums are all 
equal. If ai = a* = ... - a r = 1, 




COMBINED EXPANSIONS 


17 


and [12} becomes 

-!>.••) 113 ) 

which is the multinomial theorem in terms of power product sums. Special 
cases are 

(l) 2 = (2) + (11) 

(l) 3 = (3) 4- 3(21) + (111) 

(l) 4 = ( 4 ) + 4 ( 3 H + 3 ( 22 ) + 6(211) + (1111) 

etc. 

The result of (13) may also be obtained immediately from {3} by placing 
P(l r ) = (l) r , P = 1, and pP ■ ■ ■ pp = (pP ■ • ■ pp). 

A more general form of the multinomial theorem is that in which ai = do 
= • • • = a T = a. In this case 

(au + x<> + ■ • * + X\) = (a) 

and {12) gives 

(aY = Z r . p: ,)((«?0 n • • • ( ap,y ) (14} 

where ((api) ri • • * (ap*)* 4 ) has parts ap x , • • • , ap s , Thus 
(a) 3 = (3a) -f 3(2a-a) -f (a 3 ) 

so that, 

(2 y = (6) + 3(4.2) + (2 3 ). 

The result (14) may also be obtained immediately from {3 ; } by placing P(a T ) 
- (<*)', P- p \i'>, 3 & = l, and (api) n (ap,)*' = ((api)* 1 ... (ap 4 ) r 0- When 
JV = 2, {13) gives the binomial theorem 

(i)> ■ £ ( p'p) m 

special cases of which are 

(D 2 = (2) + (11) 

(l) 8 = (3) + 3(21) 

(1) = (4) + 4(31) + 3(22) 

(1 ) h - (5) + 5(41) + 10(32) 


etc. 



18 


PAUL 8. DWYER 


These can be readily translated to the usual form. Thus 

(a + 6) 4 = a i + b i + 4(a 3 b + bV) + 3(a 2 6 2 + Vet). 

In a similar manner the trinomial theorem appears as 

(i) ' ■ z(pSp) (mp,) 

A special case of the multinomial theorem (13) is also useful in writing N r 
in terms of sums of When the variates are all unity the power sums are 
all and the power product sums are the number of terms in the partition 
representing it. If a partition has p parts the number of terms in it is N (p \ 
We then have 

r - s (*■'<■)*“ iiM 

Special cases are 
IV 2 = N + N™ 

N l - N + 3 N* + I V tS! 

JV‘ = N + 4 N m + 3 N m + 61V® + N (i) = N + 7N W + 6 IV® + 1 V® 
etc. 

17, The Use of Monomial Symmetric Functions. It is possible to express 
the results in terms of the monomial symmetric functions by means of (8). 
Thus 

(2)(2)(2) « (6) +* 3(42) + (222) 

- M{ 6) + 3M(42) + 6M(222). 

In general, Table I may be used to express products of power sums in terms 
of monomial symmetric functions. It is only necessary to place every P p p.,. 

= 1 and to multiply by the factorials indicating the repeated entries at the 
head of each column. The table [19; 29-32] may be used similarly. 

The multinomial theorem in terms of monomial symmetric functions becomes 

uy = X * . p r)j iti!t 2 ! ■ . . . ■ ■ p. r *) 

and by [l\ 

^ ~ ? (pdHpsO r * • • ■::£?,!)'■ M ‘ ’ ’ p ‘‘^ (16 * 

as it is conventionally stated. 



COMBINED EXPANSIONS 


19 


18. The Multiplication Theorem from the Multinomial Theorem. It is 

possible to use generalization from symmetry in deriving the multiplication 
theorem from the multinomial theorem though this can not well be done from 
its conventional statement (16). The monomial symmetric function does not 
have the property that M(a*6) = M(a-a) when b = a while (a.5) does become 
(a-a) when b - a/ The first step then is to reduce {16} to power product 
sums by means of {9}. We then have 


^ ^ * * * (Vb ] ) Ti • • • **»• ^ ^ 

Next it is necessary to introduce the factor for there are many 

equal terms for each value pf l • •« when the a 7 s are all unity. This is very 
easy in this case since the value of the coefficient of (pf 1 • • • pi ') is 
It follows at once that 



Suppose that the r units are replaced by a x at • •. a r . Then the 

power products, (pi 1 ■ • ♦ pi') will be replaced by the ( Tl ^ w \ different 

\Pi / 

power products composing T(pi 1 * * * pi 0- It follows at once that 

(aiXaO ■ ■ • («o = X r(pr • • • ?;*)• 



19. The Determination of the Coefficient of a Given Power Product in the 
Expansion of a Product of Power Sums. In some cases we wish to determine 
the coefficient of a given power product without computing the complete 


expansion. 


This is given by P 



where the P coefficients are unity. 


Thus the coefficient of (32) in the expansion of (2)(1)(1)(1) is found from 



= P si + 3 jP 22 and is 4. 


20. Relation to Previous Results, The multiplication theorem may be 
viewed as a generalization of the multinomial theorem. A more general proof, 
applicable to multivariate problems, could be presented with the use of more 
involved notation. It seems wise rather to present the simpler one variate 
case and to emphasize the principle of generalization from symmetry which 
will enable us to write the multivariate laws with relative ease. 

The general problem discussed here seems to have received a very small 



20 


PAUL 8. DWYER 


amount of consideration as much of the extensive classical theory of sym¬ 
metric functions is limited to the interrelations of the elementary symmetric 
functions and the monomial symmetric functions, 

A monumental work on symmetric functions not subject to this limitation 
is the Combinatory Analysis of MacMahon [K\. MacMahon provided a 
technique for multiplying power sums in many variables as a special case of a 
more general theory. [K; II, 321]. 

Some of the work on alternants is closely related to the problem of products 
of power sums although the alternant, as usually defined, is limited to the case 
in which r =s N [I; II, 446], For an example the reader is referred to a devel¬ 
opment by Muir [L; 335-6]. 

Thiele (1889) gave tables 2 of products of power sums in terms of monomial 
symmetric functions for partition products of weight ^8 [H; 114-117], J. R. 
Roe has later given one for w g 10 [N; Plates 17,18]. Statisticians have some¬ 
times stated the results in nonfcabular form. See for example, the multipli¬ 
cation formulas of Church [13; 81-83] [14; 370-1], whose results may not at 
first appear to agree with those above since Church has used a less compact 
notation and, of course, the monomial symmetric function. 

The chief contributions of the present attack are 

1. The use of the formulas and tables of Chapter I in writing expansions of 
products of power sums. 

2. The use of power product sums in place of monomial symmetric functions 
which makes feasible. 

3. Generalization‘from symmetry. 

Chapter III 

It is the purpose of this chapter to establish formulas giving the expansion 
of power products in terms of products of power sums. 

21. The Binet (Waring) Identities. It is customary to introduce this subject 
with formulas for M(a>b) f ikf(a-b-c), etc. so we first derive the formulas for 
(a-b), (a-b'c), etc. We may use the results of Chapter II since the problem 
here is the inverse of the multiplication problem. By the multiplication 
theorem 

(a)(6) = (a + b) + (a-6) 

(a)(6)(c) = ( a -f b -f c) + (a + b-c) + (a + c-b) + (6 + c*a) + (a«b-c) 

(a + b)(c) * (a + b + c) + (a+”b‘c) 


4 These tables, are not accessible to me, but Thiele refers to them in his “Theory of 
Observations , J 



COMBINED EXPANSIONS 


21 


so we get 

(a-fc) - (a)(6) — (o + b) (^) 

(a-6-c) — (a)(6)(c) — (a + b)(c) — (a 4~ c)(b) - (6 + c)(a) 

+ 2(o + 6 + c) {18} 

Similarly 

(a.ft.cd) - (a)(b)(c)(d) - (a + 6)(c)(d) ~ (a + c)(6)(d) 

— (a + <2)(b)(c) - (6 + c)(a)(d) - Q> + d)(a)(c) 

— (c -f* d)(fl)(6) —■ (a -f- 6)(c 4" a) — (a 4~ c )(^ 4~ 

— (a 4" <0(6 + c) + 2(a 4* 6 4“ c)(<0 4“ 2(a 4 - 64 -* d)(c) 

4 “ 2 (a 4 - c 4 . d)(b) 4 - 2(6 4 - c 4 - d)(a) — 6 (a 4 ” 6 -J~ c 4 ~ d) 

....119} 

Whena ^ 6 ^ c ^ d, {18}, (19), { 20 } are also the formulas for M(ab) } M(abc) f 
M(alcd). These formulas are quite commonly attributed to Binet who gave 
them in 1812 in connection with certain proofs of determinant theory [1; 284] 
[I; I; 81]. Waring should be given credit (see Miscellanea Analytica 1762). 
Binet gave no proof. The reader is also referred to the earlier work of Paoli 
[A; section 28]. 

A much more adequate treatment was given by Hirsch in the early 19th 
century [B; 35-38]. He wrote, out the terms for JJf(a* 6 •<?*<? ^e) and indicated 
a scheme for extending the results. More than this he proved that any “numeri¬ 
cal expression”—his term for monomial symmetric function—can be reduced 
to numerical expression having one less part [B; 26]. The continued application 
of this theorem leads eventually to numerical expressions having only one part, 
i.e. to power sums. Hence all numerical expressions can be reduced to power 
sums [B; 27, 32]. 

Recent authors give essentially the same proof. See for example Bocher 
[J; 241-242] who states the theorem, “Every symmetric polynomial is a linear 
combination with constant coefficients of a certain number of the SV* See 
also O’Toole [16; 114] and Burnside and Panton [E; 167], Thus modem authors 
provide a proof of the fact that M(a x ■ *. a r ) can be expanded in terms of power 
sums but most of them fail to provide a formula giving this precise expansion. 
Even MacMahon after writing the values of Af(\«r), M(X 2 ), M(X 3 ) 

avoids the immediate generalization by stating [K; I; 7 ], “in actual practice 
there are easier ways of calculating the many part functions and the general 
formula is of little importance.” 

While MacMahon’s statement has a certain amount of truth in that any 
given monomial symmetric function may be computed from others having one 
less part by the recursion property described by Hirsch, yet there are many 
cases in which a definite formula, rather than a method, is desirable. A formula 




22 


PATJIj S, DWYER 


particularly is demanded by the statistician who is working with a large number 
of monomial symmetric functions simultaneously. See for example the remarks 
and efforts of Carver [15; 103-104, 110-120], Church [14; 373, 377-378], and 
O’Toole [16; 115]. 

Some authors have provided solutions and it appears that statisticians are 
not entirely familiar with all the work which has previously been done. It 
is the aim of the remainder of this chapter to suggest references which make 
previous work available to statisticians as well as to present a logical and quite 
complete development. The main results are not essentially new although 
their explicit statement in the language of power products is necessary for the 
development of the next chapter. The argument features the easy generaliza¬ 
tion from symmetry. The value of (l r ) is expressed in such a form that the 
value (<zi .«* a r ) may be obtained immediately from it. 

22. The Value of (l r ) from Waring’s Expansion for the Elementary Sym¬ 
metric Function. We first derive the formula (l r ) from the conventional 
Waring’s expression for in terms of the power sums. Burnside and Panton 
(E; II; 92] give this as 

__ y' __ Si fa * • • _ font 

Vm ^r(ri + l)E(r* + l) **-r(r m + l)2^3 r »--m r - 1 ' 


where p m = (-l) w Af(l m ) and where ... S r m is any ri + r% + •.. + r„ 
part partition of m. When m - r and . • * pl a is any p part partition of 
[20} becomes 


(-l) r M(l r ) = 


v (-irfar •••(p.r 

Pi !* 1 • ' • P,\ T ‘ JTlllTll T,l 


Dividing by (-l) r and noting that (~l) p r = (-l) r p we have 


and hence that 


M( 10 = 


v (-mur 

pi !* 1 ■ ■ • p,!** iri!ir s ! •••*■„! 


{ 21 } 


{ 22 } 


(l r ) = r!M(l r ) = -UlM 

2>d. T1 • • ■ p# 1 T *7T117T2! * * * 7T a ! 


{23} 


k second proof of [23], given in the next sections, does not assume the 
formula {20} and develops by easy stages. Although somewhat longer than 
the method above, it contacts much of the work that has been done in this 
field. It also provides two useful arithmetic checks dealing with the coeffi¬ 
cients which the more analytic method above does not provide. Those who are 
familar with {20} above and are interested in the immediate development of the 
argument with the use of [23} should turn to the equivalent {38} of section 28. 



COMBINED EXPANSIONS 


23 


23. The Newtonian Formulas. The development begins with the well 
known formulas connecting the power sums and the elementary symmetric 
functions which appeared in Newton's Arithmetica Universalis. These formu¬ 
las are given by Bocher (J; 244) as follows 

& - Jfcflw + • • • + (~l) t- ViSi + (~1 )>* = 0 k = 1,2, • • • (24} 

where & is the sum of the fc-th powers and p is the i- th elementary symmetric 
function. 

So many proofs of this theorem are accessible that a repetition here is hardly 
justifiable, A proof using calculus was given by Bocher (J; 243). Proofs 
using algebra only were given by Hirsch (B; 16) and Chrystal (F; I ; 437), 
Muirhead (9; 66-70) gave three proofs of which the second is perhaps best 
adapted to the present development. 

24. The Determinant Equivalent of (l r ). It is usual to solve the Newtonian 
equations for the power sums (J; 244) but our objective is the solution in terms 
of the power sums. The equations are 

pi = (i) 

(1) pi - 2p 2 = (2) 

(2) pi — (l)pi + 3p s = (3) 

(3>Pi - (2 )v% + (l)p 3 - 4 p { = (4) 

whence 


1 0 

0 . 

.0 

(i) 

(1) -2 

0 . 

.0 

(2) 

(2) -(1) 

3 . 

.0 

(3) 

(r — 2) — (r — 

3) (r - 4) . 

.(-ir 2 (r-l) 

( r — 1) 

1 

t— 1 

f 

1 

2) (r - 3) . 

.(-mo 

(r) 

1 0 

0 . 

.0 

0 

(1) -2 

0 . 

.0 

0 

(2) -(1) 

3 . 

.0 

0 

(r - 2) - (r - 

3) 0* — 4) . 

•.(~ir 2 (r-l) 

0 

(r - 1) - ( r - 

2) O' — 3) . 

.(-mo 

(-irv 


Next, factor out all the negative signs in the even numbered columns in each 
determinant. The number of these columns of negative signs is the same as 
















24 


PAUL S. DWYER 


the number in the denominator if r is odd. If r is even, there is one more in 
the deno mina tor. Hence the negative signs may be dropped in both deter¬ 
minants if (—l) 1 "” 1 'is inserted in the numerator. Furthermore the value of 
the determinant in the denominator is r! Next, change the numerator by 
moving the r-th column to the first column position and inserting the com¬ 
pensating factor (-If 1 . If A r represents the resulting numerator determinant, 
the value of p r becomes 


and 


Vr ■ 


r! 



We have then 


( 1 ) 

( 2 ) 



( 1 ) 

( 2 ) 


0 

2 

( 1 ) 


0 

0 

0 


0 

0 

0 


{25} 


(r~l)0*-2)(r-3) .(l) r ~ 1 

(r) (r - 1) (r — 2) .(2) (1) 


The determinant has received the attention of earlier writers {19; 3}. Gen¬ 
eralizations of it will be mentioned at the close of the chapter. Its expansion 
in terms of power sums is known and may be written 


A, = 2 


(-lHr! Q>i) r » 

Pl! Tl • • • ps!** 7Tl! 7Ta! • • • 7T S ! 


{26} 


where pm + pm + • •» + VsT a = r 

and ii *)* TTj -j- • • • + 7T S = p. 

See for example O’Toole (16; 113). 

It is at once evident that {26} is equivalent to {23},. Those who are familiar 
with the expansion of A r above may wish to turn immediately to {38} of sec¬ 
tion 28 since the intervening sections are devoted to a rather detailed and rigor¬ 
ous expansion of the determinant. This development follows, in a general way, 
that given by Mola (5; 190-195). 

25. The Expansion of (l r ) - A,. The determinant, A r , is a special type 
of determinant which is known as a recurrent. There is a simple recursion 
property which is useful in its expansions in terms of products of power sums. 









COMBINED EXPANSIONS 25 



(i) 

1 

0 


. 0 

0 


(2) 

(i) 

2 


. 0 

0 

Ar+1 — 

(3) 

(2) 

(i) 

. 

. 0 

0 


(r) 

(r - 

1 Hr- 

2) 

(1) 

r 


(r + 1) (0 

(r - 

1) 

(2) 

(1) 

If we expand A r 

■+i in 

terms of i 

the (r + 

l)st column we have 



A r+1 - (l)A r ~ rA r {27} 

where A r represents the determinant A r with every power sum in the r-th row 
increased by unity. It is only necessary to arrive at some method of designat¬ 
ing these terms if the above recurrence formula is to be applied. This can be 
done by inserting the power sum (1) before the other power sums which it is 
to multiply. Also in forming A r add unity to the first power sums in the expan¬ 
sion of A r being careful to retain the previous order. Thus 

Ai = (1) 

A 2 = (1)(1) - 1(1 + 1) - (0(1) - (2) 

A, = (1)[(1)(1) - (2)] - 2[(2)(l) - (3)] = (1)(1)(1) - (1)(2) 

- 2(2)(1) + 2(3) 

A 4 = (1)(1)(1)(1) - (1)(1)(2) - 2(1)(2)(1) + 2(1)(3) - 3(2)(1)(1) 

4-3(2)(2) -|- 6(3)(1) — 6(4) > {28} 

= (1)(1)(1)(1)(1) - (l)(l)(l)(2) _ 2(l)(l)(2)(l) + 2(l)(l)(3) 
-3(1)(2)(1)(1) + 3(1)(2)(2) + 6(1)(3)(1) - 6(1)(4) 

- 4(2)(1)(1)(1) + 4(2)(1)(2) + 8(2)(2)(1) - 8(2)(3) 

+ 12(3)(1)(1) - 12(B)(2) - 24(4)(1) + 24(5) 
etc. 

By collection of repeated terms and recalling that (l r ) = A r , the expansion 
becomes 

(1) = (1) 1 

(l a ) = (l) s - (2) 

(1 3 ) = (1) ! - 3(2) (1) 2(3) 

(1 4 ) - (l) 4 - 6(2)(1)“ + 3(2) 2 + 8(3)(1) - 6(4) ' {29} 

(I*) = (1)‘ - 10(2)(1) ! + 15(2)(2)(1) + 20(3)(l)(l) - 20(3)(2) 

— 30(4)(1) + 24(5). 








26 


PAUL S. DWYER 


It is possible to write values of (l r ) in terms of power sums though the practical 
difficulty increases as t increases. Also continued, use of the recursion formula 
{27} is apt to lead to error. Two simple checks are available. If D r represents 
the sum of the coefficients of the expansion of (l r ) and ) D r | represents the sum 
of the absolute values of these coefficients, then 

D, <= 0 when r > \ {30} 

| D r | = r\ (31) 

The proof of {30} and {31) follows directly from {27} since the coefficients 
of A r and A r are the same. Thus D f+ i « (1 — r)D r and | Dr+i| 5=8 (1 + r )l |. 
Since D% - 0 it follows that D 3 , D 4 , - • • , D r » 0 and since |D 2 | ~ 21 it fol¬ 
lows that | D, |, | D 4 |, - * , J D r | are 3!, 4!, r! respectively. 

26. Determination of the Coefficient of Any Ordered Product of Power Sums 
in the Expansion of A r . We next attempt to revise the process outlined above 
so as to get the formulas {29} without going through the work of writing out 
{28}. We note first that every product of power sums in the expansion of 
(l r ) in {28} has been obtained from (l) by a succession of r — 1 operations 
which were either prefixes (when the ( 1 ) was prefixed) or raises (when the ( 1 ) 
was added). Also the order of the power sums in a given term indicates which 
operations have been prefixes and which raises. For example (l)(l)(l)(l)(l) 
results from 4 prefixes while (5) results from 4 raises. The term (3)(2) results 
from 1 raise, 1 prefix, and 2 raises respectively, while the term (2) (3) results 
from 2 raises, a prefix, and a raise., The product (p 4 )(pa)(pa)(Pi) results from 
prefixes when r ** pi, r - pi -f p 2 , r = pi -f + P 3 and raises at all other 
times. 

The sign of the coefficient of (jp 4 ){ps){p 2 )(pi) can be determined when we 
recall that each raise is accompanied by a multiplication by — r while each 
prefix is accompanied by no change in the coefficient. There have been p : - 1 
+ V% ~ 1 + Pi — 1 + Vi — 1 = pi + Pi + pz + P\ — 4 raises so the sign 
where Pi + p 2 + Pi + pi ~ r. More generally if {p p ) ... (pa)(?>fi)(pi) 
is a term in the expansion of (l r ) where p p + .. * + Ps + Ps .+ pi = r the 
number of changes in the sign ispi — 1+pa — 1 + * • • + p p - 1 -pi+p* 
+ * • ■ + Pp — P = t — p. It follows at once that those products of power 
sums in the expansion of (l r ) which have the same number of factors, p, also 
have the same sign and that this sign is (-1)^. 

In determining the numerical p&Tt of the coefficient we note that each prefix 

is accompanied by a multiplication by unity which can be written in the form -. 

# r 

Each raise is accompanied by a multiplication by r so there appears in the 
numerator the product of all possible values of r and in the denominator the 
product of those values of r corresponding to each prefix. For example the 
numerical coefficient of (pOCpaMpsXpi) is 



COMBINED EXPANSIONS 


27 


(pi + Vz + V% + Vt — 1)! = _ (pi + Ps + pi + Pi) 1 __ 

(p 3 + Vi + Pi)(p* + Pi)(Pi) (Pi + Pa + Pa + pi)(ps + Vi + Pi)(ps + Pi)(Pi) 

Similarly the coefficient, without sign of (p P )(p P _i), * • • (paXp^Xpi) in the 
expansion of (l r ) is 

__ (Vf> + Vp± + - - - + Pi + ps + jgi)j _ |32j 

(p P + Pm + • • ’ + Pa + P 2 + Pi)(pm + * • • 

+ P 3 +P 2 + Pi) * (ps + P 2 + pi)(p* + Pi) (pi) 

The denominator of {32} has a certain resemblance to a factorial. Thus 
41 = (1 + 1 + 1 + 1 ) (1 + 1 + 1 ) (1 + 1 ) ( 1 ) in which the successive factors 
are found by dropping the first unit. The corresponding algebraic expression 
(p* + Ps + Ps + Pi) (ps + Ps + pi) (P 2 + pi) (pi) is found in the same way 
and might be called an “algebraic factorial / 1 It might be designated by 

(p4 + A + Vi + Pi)I 

It should be noted that the order of the terms in the algebraic factorial is sig¬ 
nificant. Thus (p 2 + pi)i ^ (pi + p*)i unless pi « p 2 . 

The coefficient of (p,Xp P -; x ) * • . (pO(Pi) in the expansion of (l r ) may now 
be written 


(-lr'r-r- r ~ • • t ~— r~ {33} 

(Pp +-b pa + Pi)l 1 

For example the coefficient 

(2)<1)(2) is (-I) 6 "* 4 

(1) (2)(2)is(-ir^L = 3 

5‘4-2 

(2) (2)(l)is (—l) 5- ' g4jrj = 8 

and the total coefficient of all terms involving ( 2 )( 2 )(l) is 15. 

With a less formal notation we might designate the sum of the p ! “algebraic 
factorials^ which can be formed from p ? , , ... p 2 , pi by 

L (Pp + Pp-i + ■ ■ • + ps + Vd l 
and the sum of their reciprocals by 

V_1_ 

(pp + Pp- 1 + ■ • • 4- Ps + Pi)l 

This notation calls for the inclusion of all the p ! algebraic factorials even though 
some of them may be alike. If 


7Tl , 7Tg , ■ • • , 7T, 



28 


PAUL 8. DWYER 


indicate the numbers of repeated p’s 

XT' 1 


(.Pp + Vp-i + 


+ Vi + Pi)i 

= 7T1 [ TTs! • • 

r! 


7 Vt 


!£' 


(P, + Vp- 1 H-+ P2 + Pl)i 


{ 34 ) 


where TV holds for the — 5 -; non-repeated terms. 

7T11 7T21 * * ’ T a ! 

In general the total coefficient of (pi)(jh) • • • (p P ) in the expansion 
of (l r ) is obtained by adding all possible terms (33 \ in which the same p’s 
occur in different positions in the product. Every possible different position 
grouping of the p’s is present but once since it is dependent solely on the unique 
order in which prefixes and raises have been combined to produce that particular 
position grouping. The number of these position groupings varies with the 
number of repeated p’s. The sum of the coefficients of these position groupings 
of the same p’s, i.e. the total coefficient of (pOfe) - *« (p P ) is then given by 

(-1 rrir 


(Pfi + * • • + Pl)l 


which can be written by means of (34) 


c-ir 


r! 


Trj! TTg! * ■ 


7T a 


! (Pp + • • * + pi)l ' 


The formula for ( 1 ) may be written 


(n-Sc-ir 


r! 


1 


irjirzl ' • • 7T a ! (Pp + • ■ • + pi)I 


: (.VtXvd ■ ■ ■ (.Pp) (35) 


27. Theorem on Algebraic Factorials. The result {35} can be further 
simplified by the theorem 


(V? + ■ ■ ‘ + Pa + Pi)i PpPp-x* 

which is proved by mathematical induction. 

A. It is true when p = 2 , since 


P 2 P 1 


{36) 


+ 


1 


(P* + P*)i (Pa + Pi)I r (Pi + Pa)l Pi + pi Lpi 
B. If it is true for p =* h, it is true for p = fc 4 . 1 since 


n+i 


Pi. 


1 

vm 


1 


(p*+i + Pk + 


+ Vi + Pi)i 
1 


Pfc+i + • * * + Pi + pi 




1 


(Vh 4 - 

+ EE 


+ Vi)\ 


P*+l {Pk + Pk- 


■-1 + • 1 ' +Px)il 



COMBINED EXPANSIONS 


29 


where E gives the h terms in which Pk+i replaces p*, , ••• , Pa, Pi re ~ 

pfc+i 

spectively. Now if {36} is true when p = k 


1 


(Phi + << - + pi)i Pk+i + ' ’ * + Pi 


Vk+ 1 H~ Vk + P*~i H~ * • ’ d~ Pi 4~ pi 
Pk+iPkPk-i • • * pm 


l 


C. Hence it is true when Jc = 2, 3, 4 — • , 


Pk+iPk '"PiV i 


28. Formulas for (l r ). Formula {35} may now be written 


do = E(-ir 


r! 


1 


TTl! * * * TrJ PiP% ••• Pp 

or if the p J s are ordered it may be written as 


(pi)(ps) • • • (Pp) {371 


ao = X(-ir 


r! 


iri! ’ • • 7 r 4 ! pi 


! nr 1 


(pi) ti • ■ ■ (p,y 


(38! 


which is the formula previously given as {23} and {26}. In addition the check 
formulas {30} and {31} become 

£ (-1)"' —--; = 0 |39} 


2^7 


Pi 1 ‘ • p a T *7ri! • • • 1T a ! 

r! 


Vi 


Pa'lTll • • • 7T,1 


= r\ 


{40} 


These relations {39} and {40} correspond to statements of Cauchy (2), (I; 1/ 
252-3) and to later remarks of Cayley (D; 577). By dividing by rl, they become 


u 

E 


(-i) r ~ p 

Pi 1 .' ' • 7Tl! • * * 7T a ! 

1 _ 

pl l ' • • pj* TTxl • • * T a \ 


- 0 

- 1 


{41} 

{42}. 


The formula {38} is easily applied. Thus 

«'> - " ® - j <«» - b (3)(!) + a < m ‘ + m < 2 «» 

-an^+Sw 

= 24(5) - 30(4X1) - 20(3)(2) + 20(3)(1) 2 + 15(2)*(l) 

- 10(2)(1) 3 + (l) s . 

with 

24 - 30 - 20 + 20 + 15 —10 + 1 = 0 
24 + 30 4- 2Q + 20 + 15 + 10 + 1 = 5! 



30 


PAUL S. DWTER 


We next write the formula (l r ) in such form that we use the principle of 
generalization from symmetry. If we multiply numerator and denominator 
of {38j by (pi - 1) - 1) ! T2 • • ■ (p. - 1)f* we get 




t _ iv* il (pi)^ • ■ • (p«) t ‘ 

(pi!) r, (pa!)’’ 1 ‘' * (pJ)'* JTil • • • ir. 


which immediately becomes 


(p. 


“’"Cr 1 p.-) (p,) '‘• |431 ' 


This somewhat formidable appearing formula is easy to apply, For example, 
in finding the value of (l 6 ) we write in one row all possible partitions of 5. 

In the next.row we place the well known values of ( ^ r.V In the next 

\Pi ■* * / 

row we place the indicated products with proper signs. Thus 


I 6 

21 s 

2 2 1 

31 s 

32 

41 

5 

1 

10 

15 

10 

10 

6 

1 

1 

-1 

+1 

+2 

—2 

-6 

4-24 


results in 

(l 6 ) = (i)» _ 10(2)(l) a + 15(2) 2 (1) + 20(3)(l) 2 - 20(3)(2) 

- 30(4)(1) + 24(5) 

indicated above, 

It is immediately recognized that formula {43} can be obtained from for¬ 
mula {3} by placing P(l r ) = (l r ); pl l • * • pV by (pi)* 1 • * * (p a ) r# and 
by (~l) r P {p x - \)F (p # - 1)P and hence that formulas of Table I 

may be used in obtaining the values of (l r )» 


29, Values of (di ••• a r ). The form of {43} also permits generalization 
from symmetry since the ^ X1 * ^ Tt ^j equal values (pi)* 1 • • • (p a ) r * are re- 


f p \ 

placed by the . ^ r \ different values composing T(pi) T * • • ■ (p,)'‘ when 
the r units are replaced by the a’s. It follows at once that 

fa® ••• 0 = 2(- ir>x - l)! ri •■• (?. - 1 )r>T(pd n ••• (p.) r * {44} 


where 

and 


r — pixi -)- p 2 ira +•••-(- 

P = iri + 7 T 5 + ... 4 - it, . 



COMBINED EXPANSIONS 


31 


As an illustration we write 

(abc) = 2 (a + & + c) — (a -f b)(c) — (a + c)(b) — (b + < 0 ( a ) 
+ (a)( 6 )(c) 

as indicated earlier by (18) and 

(W 3 <H) = - 6^(4) + 2T(3)(1) + f(2)(2) - T(2)(1)‘ + T(l)‘ 
(aiW 4 « 5 ) = 24 T (5) - 6T(4)(1) - 2 T( 3 )( 2 ) + 2T(3)(1 ) 2 

+ y(2) ! (i) - y(2)(i ) 3 + y(i ) 5 

etc. 


30. Table of Values of (a x * • • a r ). The values of the power products with 
w S 0 are given in Table II which follows the general form of Table I. In 
fact Table II may be derived from Table I by placing every 

P#...# - (- l) r " P (pi- 1)1" — (p.- 1 )! T ‘ 

ae indicated in the next section. 

31. Use of Partition Formulas. By comparing (44) with (4) we see that 
(44} can be obtained from (4} by placing 

P(a \02 • • • a*) = (ai 02 * • • a r ) 

- (- in* - 1)1* ... (p. - i)P- 

and y p !‘---il- = y(pi) n ■ (p.)" 

It appears then that the values of any power product sum (aia* * * - a r ) can 
be obtained by writing the expansion of P(a x • • • a r ) and substituting as indi¬ 
cated. Thus since 

P(321) = P 3 6 + P U B1 + P 2i 42 + P 2i 33 + P 1U IH 

(321) - 2(6) - (5)(1) - (4)(2) - (3)(3) + (l)(l)(l). 

It is also immediately apparent that Table II can be obtained from Table I 
by placing P p *x...equal to (- l) r ~ p (pi - l)l n • • • (p a - l)!'* and that the 
main results of Chapter I, including the recursion rule, are applicable to the 
present problem. 


32. Coefficients of Given Terms in the Expansion of Product Power Sums. 

The methods of the last section are also useful in finding the coefficient of any 
term in the expansion. For example we wish to find the coefficient of (3)(2) 


in the expansion of ( 2111 ). 


We note that P 



~ P 31 + 3P 22 and that 



32 


PAUL 8. DWYER 


TABLE II 

Power product sums in terms of products of power sums when W g 6 
W = 6 



6 

51 

42 

33 

411 

321 

222 

31 8 

2 a l a 

21< 

l 6 





6 

1 












W = 

= 1 

51 

-1 

1 












1 

42 

-1 


1 










1 

1 

33 

-1 



1 











411 

2 

-2 

-1 


1 










321 

2 

-1 

-1 

-1 


1 







W - 

2 

— 

222 

2 


-3 




1 







2 

ii! 

31* 

—6 

6 

3 

2 

-3 

—3 


1 





2 

1 


2 s l a 

—6 

4 

5 

2 

-1 

-4 

-1 


1 




11 

-1 

1 

21 4 

'24 

-24 

-18 

—8 

12 

20 

i 3 

-4 

-6 

1 





l 3 

-120 

144 

90 

40 

i-90 

-120 

i-16 

i 40' 

45 

-16 

1 1 





W m 3 



3 

21 

m 

3 

1 



21 

-1 

1 


111 

2 

-3 

1 



5 

41 

32 

311 

221 

2111 

mu 

5 

1 







41 

-1 

1 






32 

-1 


1 





311 j 

2 

-2 

-1 

1 




221 

2 

-1 

-2 


1 



2111 

-6 

6 

5 

-3 

-3 

1 


mu 

24 

-30 

-20 

20 

+15 

-10 

i 



4 

31 

22 

211 

1111 

4 

1 





31 

-1 

1 




22 

-1 


1 



211 

2 

-2 

-1 

1 


1111 

-6 

8 

3 

-6 

1 


the coefficient of (3)(2) is P sl -f 3P 22 where P M = (- 1) 4 " 2 2! = 2 and P n 
— (— l) 4 2 = 1. Hence the coefficient is 1.24- 3-1 = 5. 

33. The Expansion of the Monomial Symmetric Function. If a x N <h H a s 

^ ••• N a ' then M(a y ... a r ) = (a x ■ • • a T ) and previous results are applicable. 
If however the product power sum is of the form 












COMBINED EXPANSIONS 


33 


then 

MM'-"...*")-35^3 

and 

M{aV ■ • • O 

= __i— , £ (-ir*(w - D! ri ■ ■ ■ (p, - i)i*' rep.)* 1 • • • <?>.)*• (15} 

aiiad • ■ * a*I 
For example 

M(421) = 2(7) - (6)(1) - (5)(2) - (4)(3) + (4)(2)(1) 
itf(322) = (7) - (6) (2) - |(4)(3) + H3)(2)(2). 

M (2 2 1 2 ) - -1(6) + (5)(1) + 1(4)(2) + K3)(3) - K4)(l)(l) 

- (3)(2)(1) - i(2)(2)(2) + i(2)(2)(l)(l). 

Study will show that the formula {45} is equivalent to one given by Fail de 
Bruno (C; 9) and later by Roe (7). 

It is possible to use Table II in finding the expansion of the monomial sym¬ 
metric functions. It is only necessary to multiply each term in the expansion 

of On ... Ur) by—pi- 

The check formulas give, in the case of the monomial symmetric function: 
The sum of the coefficients in the expansion is 0. 

T 1 

The sum of the absolute values of the coefficients is —:— - ? . 

atilazl * * • «a! 

The reader might compare the second of these checks with the results of 
Fa& de Bruno (C; 14), 

Tables giving the expansion of monomial symmetric function have been 
given. One by J. R. Roe (12; plate 18) includes all cases of weight glO, 

34. Previous Results. Previous authors have studied the monomial sym¬ 
metric function. Gordan ha s deduced a monomial symmetric function formula 
which is recommended by J. R. Roe (M; 24-33). MacMahon has given a 
general formula (K; II; 320) for expanding any monomial symmetric function 
in terms of power sums together with an operational method for its evaluation. 
O’Toole also has given a differential operator and showed how it could be applied 
in obtaining expansions (16; 115-130). O’Toole has also given a method of 
expanding symmetric functions in many variables by means of differential 
operators, (17). 

Another method of attack was based upon the close relation existing between 
the elementary symmetric function and the determinant of the power sums. 
This has resulted in the expression of the monomial symmetric function in 
determinant form. Brioschi appears to have been the first (1854) to see how 



34 


PAUL 6. DWYER 


a symbolic determinant could be used (3; 427) although he gave no proof, 
Bellavites tried in 1857, but obtained incorrect results (4). In 1876 Fail de 
Bruno made an attempt, but he too was in error (C; 10). In 1898 E, D. Roe, Jr. 
proved that Brioschi was right (7). Muir also gave a proof in 1908 ( 11 ; 5-9). 
The summation of determinants, rather than the symbolic determinant, was 
used by Hankel ( 6 ; 90-94) (L; III, 220). 

The determinant of the power sums has been generalized in another way. 
A group of writers has studied the "immanents” of its matrix, D. E, Little- 
wood and A. R. Richardson have recently written a series of papers on this 
topic, One of these papers (18; 99-141) defined the term "immanents” and 
gave references to previous investigations dealing with this matrix. 

It has been the aim of this chapter to present an easy development of the 
subject of the expansion of product power sums and monomial symmetric func¬ 
tions. This development is characterized by 

L The use of the formulas and tables of Chapter I in writing expansions of 
product power sums, 

2 . The use of product power sums in place of monomial symmetric functions 
which mokes feasible 

3. Generalization from symmetry, 

4. References to previous work. 

Chapter IV, The Double Expansion Theorem 

In the present chapter we combine the multiplication expansion of Chapter II 
and the power product sum expansion of Chapter III into a new result which 
is to e known as the double expansion theorem. We show that this result 
may a so be expressed in terms of the partition notation of Chapter I. 

35. The Value of J£(ai)(aa). We know 

(fli)(ct2) = (tti -|- Oa) -j- (fli'Ctj) 

#and if we multiply (oi -f ah) by fc, and (ai-aj) by we have a new expression 
which we designate by £(ai)fa). 

*— ^(fli 4“ Q%) 4~ ft'ii(®i , fl2) {46} 

Smce = (fti)(as) - (fli 4- ah) 

•^(®i)(aa) = kt(fii 4- fty) 4* fcn[(fli)(a2) — (a* 4” ^ 2 )) 

£(«i)(ai) = (ft, - ftu)(a! 4 - q 2 ) -f ft u (ai)(aa) 

which can be written 

*(oi)(a 4 ) - £,(* + a*) + KjfkXah) (47) 

if h - hi = K z and K u = k n 



COMBINED EXPANSIONS 


35 


36. The Value of lf(ai)(a2)(as). We know from {12} that 

WWW = T(3) + T(21) + r(lll) 

and we define if(ai)(o 2 )(a 8 ) = fesT( 3) 4- ^ ( 21 ) 4- fcin !T(lll). Inser ting 
the values T* = (ai 4- <k + a 3 ), Tn = (a t + ^ o 3 ) + (ai + a 8 • 03) + (a 2 + a 3 • ai), 
Tin = (^0203) and reducing to power sums by {44}, we get 

ijr(ai)(ufi)(G3) = (A% — 4" 2km)(ai -f- 03 + a t) 4" (feu — km) 

{(&i + <22X03) + ( a i + 23) (®a) + (02 + ®j)( a 0) + feiii(»i)(a*)(fli) 

which may be written 

KMMM = ^ 5 (ai + & + a 3 ) + Xti{(<h'+ te)W 

4- (fli 4” “b + ^ 3 ) (oi)} + ^m(®i)(®2)( a a) {48} 

where K 3 = feu — 3fc2i 4" 2feui, lln — ha — fem, Km “ fern * 

37. Definition of X(ai)(a2) • • • (n r ). We define 

KMM *. • (or) - Z Kv-p\> TW • • • prO {49} 

where T^p' 1 <. ■ pj‘) is composed of f ri * rt J power product sums. We 

wish to find the value iC(ai)(aa) • - (a r ) in terms of power sums. This in¬ 
volves the expansion of each power product sum in terms of power sums and 
then the collection of the results. This algebraic process is to be called the 
double expansion process and the theorem which results, the double expansion 
theorem. 


38. Special Cases of the Theorem. The results {47} and {48} are special 
cases of the double expansion theorem when r « 2, 3. When r = 1 it is evident 
that -Kfo) = KM = fci(ai). (50} 

The results {50}, [48}, and {49} may be written symbolically by 

KM = KiT( 1) 

KMM - K,T( 2) 4- KnT(l) 2 {51} 

tf(«i)(o,)(«j) - -fir„T(3) + KnT(2)(l) + K,uT(l)‘ 

It can also be shown, with a much more extensive use of the results of Chapters II 
and III, that' 

i£(ai)(<h)(«)(<u) = K t T(i) + jf„T(3)(l) + K a T(2f 

+ -hfm7 1 (2)(l) 1 -f {52} 

K(a,). • • • (a.) - K,T(S) + JC«ir(4)(l) + *,T(3)(2) 

+ X»T(3)(1)‘ + JC«I’(2) ! (1) + ff !m T(2)(l) ! 

4* XimilXl) 1 (53} 



36 


PAUL S. DWYEtt 


where 


Ki ~ ^4 — — fifeuu 

j( n = ks i —37c2ii •}" 2fcim 

/C 22 = kw — 2feu + friui 

Km = ftan “ fruit 


i£\ = jfe 4 :— SJkii — lOfcsa 4- 20fcan + 30fr«i “■ 60feui + 24feiun 


Kv = 
K«I - 
JBTm - 
Ksm — 


4fcsn — 3^231 -f 12km — 6&mu 
fan 3fcj2i -f- Sfcsm — 2frum 
km — 3^m -h 2fcnm > {55} 

fr‘221 — 2&2111 + frum 
fram — frmn 
frimi , 


Kiim 1:5 frimi J 

We may say then that, for r < 6 

K(ai) ■ • • (a r ) = £ frpV ^pV T(pl l ■ * • }>!') 

« jP(pi) ri (p,) r ' (6®) 

where is defined by the relations {47}, {48}, [54], and {55}, In 

examining the value of K r we note 

Kt - k 

K% = k% — &u 

JFv3 =* kg — 3kn *4' 2fcm 

Ki — ki — — 3^22 d" 12^211 — fifrnii 

7ifi = hi — oka — IO/V 32 -J- 2 OA 311 + 30fraji — OOfram -t- 24fruut 

and that these are given, for r < 6 by 

“ 2 (*”I) p (p “* 1)^*1,.. Kl l ^'pV {&?} 

It is further to be noted that {57} can be obtained from {3} by placing P(l r ) 
” X, pV • ■ ‘ VV by kt 1 , and Ppp...,,*. by (—1 ) p Cp — 1)1 Hence the 
last tows of Table I may be used in writing the values of K r . Thus from 

P(l 3 ) « W + 3?n(21) + P lu (i) 3 



COMBINED EXPANSIONS 


37 


we get 

jfCj xts Icq — 3itj] -|- 2Jfcm» 

It is further evident that if = (k 3 ~ 3fei -j- 2k in )(h - hi) indicates 
multiplication by suffixing of subscripts that K 3 K 2 = hi - hu - %hn *b 

— 2^iim “ 

and in general it can be shown that for r < 6 

X r ,fj ” K ri K Ti 

Kr lTi t s = S ri iC r “if“ {58} 

etc. 

so that all values may be obtained by symbolic multiplication of 

equations {57}. 

The method of this section can be used in demonstrating that the results 
{56}, {57}, and {58} hold also when r * 6, 7, 8 • ♦ • , but the amount of alge¬ 
braic manipulation increases enormously with each increase in r. We establish 
these results, for all integral values of r, by a more general approach. 

39. A More General Definition, We provide a more general definition of 
JC(ai) • * ‘ (a r ) by letting the subscripts of the k’s agree with parts of the given 
partition rather than with its complete order. Thus 

Al' (Oi 53 ~f" ®t) + K xn (<h<h) 

and in general, if g* 1 -.. q * 1 represents any p part partition having complete 
order pi 1 •»• then we may define 

^( fl i)(°2) ”* ( Q r) 55 '«•?** (fli 1 1 ** Qt *) {59} 

where the summation holds, not only for every different complete order as 
does {49}, but for every possible partition. By {44} (g?‘ • •. g*‘) may be 
written as 

(at* • ■ • t‘‘) = E(-l)'-'(d, - DK* - 1)l... (d, - 1) \T(dP ■ ■. (d.) 

where di + cfe 4- * • ■ + d 9 = p and where groups of the d’s may be alike. If 
(wiK^a) • • • (w 0 ) is one of the products of poVer sums having the complete 
order (di * > ■ d a ) we may write 

(gi 1 • • * tf) = E(-ir'C* - i) 1 • ■ * & -1) \M W • • • M {60} 

where 

*b • • 1 Qi%i ~ 10 * Wi -}” ^*5 "t" * * * -b tPj 
and 


-b • * * -b — P 



38 


PAUL 8. pwyeh 


and where the summation sign holds not only for every complete order di > - ti fl , 
but for all power sum partition products W(%) • • • (%)• 

The insertion of (60} in (59) gives 

tfVOfe) 1 * ' (°r) 

- EE i-ir'tei - Di • • ■ (d g ~ i)W • *’ («,) leii 

40, Value of K' u , The notation of K' v is used to indicate the coefficient of 
the power sum (w) — (oi *f* (h + • • * + fl r) in the expansion of {61}, In this 
case di « p and g *= 1 so that 

(62} 

which may be written more symbolically as 

&= £(-ir l (p-DI^„ (63] 

where tt w represents any algebraic partition of a\ d--f a, and p indicates 

the number of its parts. 


41. Products of K n 9. The notation is used to indicate the product 

of K.' V1 by Kh if the rule of multiplication is the suffixing of the subscripts of 
the W s in the expansion of K* x and KL, . Thus 

f^oi+oj‘ffoj 5=1 (^ai+nj (fiJaj) 

5=5 hoj-j-aj.flj ?’ditijaj 

More generally, if we write, from {63} 

x ». - £ (-lr'-'Wi - Du*, 

< = t (- 1 - l) l K, 

and use multiplication by suffixing of subscripts we have 

~ E (“l) ? ^(dl — l)ll* • ' (dg — 1) fcjr Ui ...ir Wtf {64} 

where p = di d$ -f • > • -p d g and the summation holds for every partition 
which can be formed by combining any algebraic partition of wi , any partition 
o! Wi, .. • j any partition of w g . 

42. The Coefficient of (u) t )(wj) ... (w 0 ). The coefficients of any specific 
product of power sums (w,)«| ... i s from {61) 

- E^p..^ — 1)1 ... (d p - 1)1 {65} 




COMBINED EXPANSIONS 


39 


where the summation holds, not only for the partitions of a\ -f- az + • • ■ -f a r , 
but for the partitions 7r wi , t W(J since these partitions can be combined 

to form (wi)(ws) •** (w p ). Hence (65] becomes 


- Z (~iy-‘(ck - 1)1 • •• (rf„- 1)1 166) 

and it is immediately seen that the right hand expressions of (66) and (64) 
are the same and hence that 


kL 






as expected from (58), 

We can now say that 

tf'(ai)W * * * (cir) = Z W • • • qV) 




= Z ( M l)W * • • W 

[67] 

where 





KL = 

Z (“* 1) P-1 (P 1)1 

[68] 

and 






• i K W 2 * ■ ’ 

(69) 


Relations (67), {68} and (69) constitute the general double expansion 
theorem. 


43. The Double Expansion Theorem. The case of the double expansion 
theorem in which we are especially interested is that in which the coefficients 
of all similar power sum products are the same, i.e., is a function of 

the complete order indicated by . In this case (68) becomes 

K u - E(“l) P (p-l)l( p n Kv -ft (70) 

where the summation holds for all possible complete orders. Suppose now 
that the r algebraic expressions, , a *, ■ • • , a r are all unity then (69) becomes 


K' “ E (-irO - D! ( Pi „ 



and we find that K w = K r . We may then write (67), (68) and (69) as 


*(“■) ... (a,) = 2 k p h... t -„ (pi • • • pi) » Z T(n) ... (r.) |71| 

where 

R r - E (-l)'“(o - 1)1 ( pi „ . ir . p ..) Kv...* 172} 


K, 


T X rt* 


— Kti Kr t ’ * * inl¬ 



and 



40 


PAUL 8. DWYER 


Now T[T<i ♦ • * ?g indicates any grouping of the & s, and. hence any complete 
order 0 f ai + <h H- • • * + a r . Sa {71} may be written, with a slight change 
of notation as 

• • • (a,) « 2ft P p... p pT(pI 1 *“ 

.rf-rCpi) 11 **- fo)** [74] 

The relations {74}, |72) and [73) are the desired generalizations of [56], [57) 
and [58} and hold fox all positive integral values of r. 

The double expansion theorem provides a method of writing out the result 
of the double expansion process without going through the work involved in 
the process. Thus 

X(3)(2)(l) - K,(6) + X*Lt(6)(l> + (4)(2) + (d)(3)) 

+ X ul (3)(2)(l) - (fe - 3** + 23fcuOW) [75) 

+ {hi — ^ni)l(5)(l) + (4)(2) 4* (3)(3)} -}- feiu(3)(2)(l) 


44, The Double Expansion Theorem and Partition Notation, It is im¬ 
mediately evident that {74) can be obtained from [4] if P(ai • • • a n ) is replaced 
by #(aj) (a r ), if P p ?...»* is replaced by K p p... p *' t and if pP • • * pp is 

replaced by T(pi)' 1 * • • (p.)*'* It follows at once that tho entire theory of 
Chapter I,—table, recursion formula, etc.—is applicable to double expansion 
theory. Tor example {76) above is obtained from 

jP(321) - P 3 6 + P 2 i{ 61 + 42 + 33) + P m 321 

simply by replacing the it's by the P's and enclosing the parts in parentheses. 
We can as well use P’s as IC’s to represent the double expansion theorem and 
hence have available a list of double expansion formulas when m S 6. We 
also have available a recursion property for writing double expansions beyond 
the scope of the table. Thus for example, the illustration at the end of sec¬ 
tion 9 may be interpreted as a statement of the double expansion theorem 
when ai = 3, as - 2, as = 2, a* = L 


45. The Case of Equal Powers. In case a = Oa = a 3 — 
reduces to [3'} of Chapter I with 

^ E - 1)!^, * . p r^j fcpIlM.,j. 

and 

' ^ ^ P n^ri ' ■ * Pfg. 

Formula [74} also reduces to {3} when ai = a* ~ • *. & a, «= 1. 


a, [74) 


( 76 ) 


46, Special Values of K p p... p: ,. 

A. ,, p »i = 1, In this case the coefficients axe all unity and 
•PfciXoi) - * - (a T ) = , v (a r ) 



COMBINED EXPANSIONS 


41 


It Mows that P, - 0 and that P p \\.,. v j« = 0 except that P f = 1. Placing 
P r = 0 and Jk p p.,,pj. - 1 in (72} or its equivalent (76) we havej when r > 1 

o = E(-ir( P -i)i( p , l i ;. Pi „) (771 

where the summation holds for every partition of r. This formula should be 
compared with {39} and {40}. When r = 4 and the partitions are 



4, 

31, 

22, 

211, 

1* 


{77} gives 

1 

-4 

-3 

+ 12 

— 6 

= 0 

{39} gives 

-6 

+8 

+8 

—6 

-{- 1 

= 0 

{40} gives 

6 

+8 

+3 

+6 

+ 1 

- 4! 


The equivalent of (77) was first given by Cayley (D; 576) who at the same time 
noted the similarity to {39}. 

It follows immediately that the sum of the coefficients in the expansion of 
Ppp.,. p j., except Pr, is 0, for the sum of the coefficients of Ppp...pp is the sum 
of the coefficients of (PpJ* 1 • • • (P,) T ‘ and is 0. For example the sum of the 
coefficient of P 3a = & a? - h m - Zk m -f 5 k im - 2knm is 0, 

Since the coefficients of (j/[) T1 ... (19; 25) in the expansion of Thiele 

half invariants are (-l) p_1 (p - 1)1 [ „, 1 r ) it follows from (77] that the 

\Pi '"Pi / 

sum of these coefficients is 0. 

n « 

B. . In this case all terms having the same number of parts, 

n u>) 

p f have the same coefficients. If we indicate by pi, , . .» } when p = 1, 

2, *, {57}, {76} become 

Pi = Pi 

P2 = pi - pi 

Pb ~ pi — 3pa + 2p 3 

Pi = pi — 7pa -f 12/oa — 6p4 

P 6 = pi — 15p 2 + 50p 3 — 60p4 + 24p fi , 

etc. 

which are the formulas which have been used by Carver (16) and O’Toole (16). 

Many other additional cases can be obtained by giving different values to 
App.,.*;., but a discussion of these is hardly justified here os the case in which 
is a function of the number of parts, p, is to be used in Part II. 

47. Relation to Previous Results. No general statement of the double 
expansion theorem has previously been given although the special case j£(a r ) 



42 


PAUL 6. DWYER 


has been developed by Carver (15) and O'Toole (16), Their results are further 
restricted to the special case (B) of section 46. The application of the double 
expansion theorem in this case is very useful in studying sampling from a 
finite universe as Carver has shown and as is demonstrated in Part II. 

Most writers who have worked on the problem of moments of moments have 
gone through the double expansion process, but Carver was the first to note 
that the result of the process can be written in terms of the P polynomials 
above. It seems appropiate therefore to refer to these P polynomials of the 
coefficients as Carver polynomials. 

Chapter V. The Multipartition and Multivariate Formulas 

It is the purpose of this chapter to show how the results of Chapters I, IP 
III, and IV may be extended to the case of different variables. 

48, Multipartitions. Tables. Formula {4j is still applicable if we let the 
a, units be the units of one quantity, the <% units to be the units of a second 
quantity, etc. Thus for example the formula P(aiOs^) may be used to repre¬ 
sent the precise number of ways in which ai apples, 02 pears and a z peaches can 
be formed into groups without breaking up the groups of apples, pears, and 
peaches. 

Various conventions for representing multipartitions of this type have been 
used. We adopt the one in which the individual partitions are written in 
successive columns, The partitions of the first number are combined with the 
partitions of the second number to form all possible multipartitions. Thus the 
multipartite number 111 has the partitions 

111 110 101 011 100 
001 010 100 010 

001 

where the parts arc given in the rows. It is desired to show the number of 

ways in which any one of those partitions may be combined to form partitions 
of fewer parts. Thus 

J m \ HO 101 011 100 

PI 010 = Pjlll T P 21 OOI Hb P 2I 01C + PnlOO + P lu 010 

W 001 

This is obtained from P(a, 0 , 09 ) by placing - l x ,o. « h, a 3 - l 3 ,and could 
be written from {4} as 

P(lil*l 3 ) = PiUv + U + U).+ PnU7+ Iris + l7-hlrl 2 -b VH^.lxj 

4- Piuli-li-lj. 



COMBINED EXPANSIONS 4<J 

Similarly 

/10\ 20 02 
P 101 = ? 4 2 + 2Pai21 + 2 P 31 12 + P M 20 + 2Pnll + Pm 01 + P 1U 10 

01 2 01 10 02 11 01 10 

V/ 

11 10 

+ 4P JU 10+P im 10 
01 01 

01 

is a special case of PCaiOafitsaO where di - li, ctj - li, a 3 = la, o* = lj* For¬ 
mula (41 is also true where the ai units are not of the same kind. Thus 

P(®i 0 $) ” Pa(ai -f* "h Pu(®i®a) 

gives 


p a 2i + p n : 


when ai - 1* + h and oa « lj* 


TABLE III 


The Multipartite Number 111 


111 


101 

Oil 

010 

100 


100 

p 3 

Pai 

P 2 

010 

, 



001 






















44 


PAUL 8, DWYEH 


TABLE III —Continued 


The, Multipartite Number 22 



22 

21 

01 

12 

10 

20 

02 

11 

11 

20 

01 

01 

02 

10 

10 

11 

10 

01 

10 

10 

01 

01 

22 

ft 








_ 

21 

01 

ft 

Fn 








12 
, 10 

ft 


fti 







20 

02 

ft 



Fn 






11 

U 

ft 

i 




fti 





20 

01 

01 

i 

ft 

2fti 

1 


P 21 


Fin 




02 

10 

10 

■P 3 


2fti 

fti 



Pm 



11 

10 

01 


fti 

fti 


Fn 



ftu 


10 

10 

01 

01 

Pi 

2Pai 

2fti 

Fa a 

2ft a 

P 211 

P 211 

4ft u 

Puu 


Tables can be made for the partitions of the various multipartite numbers* 
In Table III are presented values for the numbers 11, 111, 22. 

When the units are indistinguishable 11 condenses to the w « 2 part of 
Table I. 

When the units are indistinguishable 111 condenses to the w « 3 part of 
Table I. 

When the units are alike 22 condenses to the u) = 4 part of Table I. 

49. Multivariate Distributions. The chief results of Chapters II, III, IV 
also hold for multivariate distributions, Some additional definitions are neces- 




COMBINED EXPANSIONS 


45 


sary. We suppose that the N variates $ 1 , x 2i . • • , %x are replaced by the 
Nr variates of the array 

1&1, 1*2 i ' ' • j i 


2$1 > 2^2 I • • • , 2% 


|78) 


r%i j r%2 i 1 ' * i r%tf 

where the presubscript represents the variable. The power sums become 

(a-i) — iXi l 4* i£2* 4 • ■' 4 x a N { = 1*®i 1 

(< 12 ) = 4 i®2* 4 * ■ * 4 *$5? ~ X) 2®f* 

It is not necessary to utilize the presubscript since it is precisely the subscript 
of the a. That is the power sum (at) is defined by X) 4*. Similarly (a t a 4 ) 
= X) is? 2 ^? can be written as (a^) = X 4 l £? 2 without introducing ambiguity, 

ihj 

In general { 6 } as well as {4J, now holds for the multivariate case, It follows 
at once that the results of Chapters II, III, IV can be written for the multi¬ 
variate case by means of the formulas of Chapter I as indicated by the previous 
section. Thus the formula for P(lilil 2 l 2 ) may be written as [Table III] 

P[l0*10'6T*0l] = F,22 4 2P 3 i2l’0l 4 2Fj2’10 4 P»2602 4 2?IIII 

4 P 2 ii 20 01 Ol 4 Fan02 10 10 4 4P 2U U 10 01 4 PmilO 10 01 01 

and can be interpreted as: 


(10)W = (22) + 2(21 -01) + 2(12.10) + (20.02) + 2(11 -ll) + <20-01-01) 

+ (02-16.15) + (H-IooT) + (lo-lo-ol-oi) 

by { 12 } of Chapter IL It may also be interpreted as 


(lO.IO-Ol-oT) - - 6(22) 4 4(21)(01) 4 4(12)(10) 4 (20)(02) 

4 2 (ll)(ll) ~ ( 20 )( 01 )( 01 ) - ( 02 )( 10 )( 10 ) 

- 4(11)(10)(01) 4 (10)(i0)(01)(01) 


by [44} of Chapter II, It can also be interpreted as a double expansion by 
means of section 44 where the values of the Ps are given by the usual 


F r 


I(-ir(p-i)!( K . 



•up 



■To 


- F ri P r , 



50. Summary. It is apparent that [4) not only expresses (a) the number 
of ways in which the parts of one partition may be collected to form the parts 



46 


PAUti 8. DWYER 


of another partition, (b) the formula for expanding products of power sums 
in terms of power product sums, (c) the formula expanding power product 
sums in terms of power sums, and (d) the formula for double expansions, but 
also that it can be used to mate similar expansions in the case of multivariate 
distributions. 


BIBLIOGRAPHY 

(1) Binet: "Memoirs aur un syst&mo de formulas analytiqucB etc.” Journ. de l'Eo. 

Polyt,, IX (1812), eah 16, pp, 280-302, 

(2) Cauchy: “Note aur la formation des fauctions altemfies qui servent h r6soudro lc 

probl6rae de Elimination.Comptes RenduB, 12 (1841), pp. 414-428. 

(3) Bwoschi: “Sulle funzioni simmetriche delle radici di una equarione." Ann&li di 

Tortolini, 6 (1854), pp, 422-428. 

(4) Bsllavitis: "Sposizione elementare della teorica dei determinante/' Memorie 

,, . . Istituto Veneto, 7 (1857), pp. 67-144, 

(5) Mola : "Soluzione della quistione 5, 6, 7." Giornalo di Matematicho, 3 (1865), 

pp. 190-201. 

(6) Harked: "Darstellung symmetrischer Evmctionen (lurch die Potenzsummen/’ 

Crelle’s Journ., 67 (1865), pp, 90-94. 

(7) E. D, Rob, Jr,; "Note on & formula of symmetric functiona." American Mathe¬ 

matical Monthly, 6 (1898), pp. 161-164. 

(8) E. D, Ron, Jr.; "On the transcendental form of the resultant/' American Mathe¬ 

matical Monthly, 7 (1900), pp. 69-66. 

(9) Muirhead: "Some proofs of Newton’s theorems on the sums of powers of roots/' 

Edinburgh Mathematical Society Proceedings, 23 (1006), pp. 60-70. 

(10) Muirhead: "A proof of Waring’s expression for 2a r in terms of the coefficients of 

the equation/' Edinburgh Mathematical Society Proceedings, 23 (1906), 
pp. 71-74. 

(11) Mum: “Waring's expression for symmetric functiona in terms of sums of like powers," 

Proceedings Edinburgh Math. Society, 27 (1909), pp. 6-9, 

(12) A, A, Tqkouprqff: “On the mathematical expectation of moments of frequency 

distributions/ 1 Biom., 12 (1918), pp, 140-169. 

(13) A. E. R, Church: "On the momenta of the distribution of the squared standard de- 

ations, etc." Biom,, 17 (1926), pp. 79-83. 

(14) A, E. R. Chubch: “On the means and squared standard deviations of small samples/' 

Biom., 18 (1926), pp, 321-394. 

(16) H.C. Carver: "The fundamentals of sampling," Annals of Mathematical Statistioa, 
1 (1930), pp. 101-121, 

(16) A. L. O'Toole: "On symmetric functions and symmetric functions of Bymmctrio 

functions," Annals of Mathematical Statistics, 2 (1931), 101-149. 

(17) A, L. O'Toole: "On symmetric functions of more than one variable and of frequency 

functions.” AnnalB of Mathematical Statistics, 3 (1932), pp. 60-63. 

(18) D. E. Littlbwood and A. R. Richardson: "Group characters and algebra." Phil. 

Trana. Roy. Soe., A 233 (1934), pp, 99-141, 

(19) P, 8 . Dwyer: "Momenta of any rational integral isobaric sample moment function/’ 

Annals of Mathematical Statistics, vol, viii, no, 1, Mar, 1937, pp. 21-66. 

A. Paoli: "Elementi di algebra." Supplement Opuacolo II, 1804. 

B. Hirbch: “Examples, formulae, and calculations on the literal calculus and algebra/' 

Translated from the German by J, A, Robb (1827). 

C. FaX de Bruno: "ThGorie dea formes binaireB" (1876). 

D. Cayley: "Collected Mathematical Papers/' VII (1894). 



COMBINED EXPANSIONS 


47 


E. Burnside and Panton: “Theory of equations” (1881). Reference ig to the seventh 

edition (1012). 

F. Chrystal: “Algebra” (1886). Reference is to the fifth edition. 

G. Whitworth: “Choice and chance,” 4th edition (1886), 

H. T. JST. Thiele: “Ainundelig fagttagelscsiaere" (1889). 

I. Thomas Muir; “Theory of determinants,” 

J. M. Bocher: “Introduction to higher algebra” (1907). 

K. MacMahon: “Combinatory analysis” (1015-10). 

L. Thomas Muir: “Contributions to the history of determinants” (1900-1920). 

M. Josephine Roe: “Interfunctionai exprossibility problems of symmetric functions” 

(1931), 

N. Josephine Roe: “Interfunctional expressibility tables of symmetric functions.” 

Distributed by Syracuse University (1931). 



ON THE INDEPENDENCE OF CERTAIN ESTIMATES OF VARIANCE 1 


By Allen T. Craig 


1. Introduction. It is well known that a necessary and sufficient condition 
that several statistics be independent in the probability sense, is that the char¬ 
acteristic function of the joint distribution of these statistics shall equal identi¬ 
cally the product of the characteristic functions of the distributions of the 
individual statistics. Thus, if , • • *, $h are N independently observed 
values of a variable $ which is subject to the distribution function/(a), and if 
$i , 02 , • * ■, 0 4 are s statistics, each computed from the N observed values of x, 
the characteristic function of the joint distribution of the s statistics is given by 



e ,tl * rh ’‘ w 7t c i) * * * fM * - • dxi. 


Here, i - V“I and the limits of integration are taken so as to include all 
admissible values of x. Since the characteristic function of the distribution of 
0„, v = 1, 2, - • •, s, is given by 



■* 4 l 


e itJv f(x i) • * • j(x N ) dx N • • • rfo, 


the necessary and sufficient condition for the independence of the $ statistics 
can be written 



<p{ti, • * *, U) - <pi(h) • • 1 


for all real values of ft, fa, < < * f». 

An important phase of sampling theory in statistics is that in which the 
variable x is subject to the normal distribution function 



1 

V 2tt 


sr* 


e 




— O? < x < 00, 


and 0i, • • <, 6s are $ real symmetric quadratic forms in the N independently 
observed values of x. That is, 



0i = £ 

j-i i—i 

N It 

02 = S 2 bjkXjXk i 
7-1 Jfc-1 

4 

* jY 

0< = iC Viktoj$k} 

j-i fc-1 


1 Presented to the Institute of Mathematical Statistics on December 30, 1937, at the 
invitation of the progam committee. In the paper, we discuss, from a slightly different 
point of view, some of the material found in the references given at the close of the paper, 

48 



INDEPENDENCE OF CERTAIN EBTIMATES OF VARIANCE 


49 


so that 



where T «=* k 2 £ + * * • + U £ £ ?#**/** - i 2-®J. H 

At, * * ■ , A» denote the real symmetric matrices of the s quadratic forms, the 
characteristic function can be written 

, • • * , It) — | jT — 2%a tiA x — • • * — 2io^t a A a | ^ f 

where I is the unit matrix of order N and the vertical bars indicate the deter¬ 
minant of the matrix within them. Similarly, the characteristic function of 
the distribution of B v is given by 

<pM - 11 - 2 \iff\A v f 1 , 

so that a necessary and sufficient condition for the independence of the a real 
symmetric quadratic forms can be written 

(3) \I-2iShAy - 2ifft t A e | -2i<rH v Av\, 

11-1 

for all real values of U , - , l> • 

Although equation (3) is fundamental and is of considerable value in certain 
problems, it should be remarked that it is frequently rather tedious to use. 
This suggests that by strengthening the hypotheses, it may be possible to 
establish another necessary and sufficient condition which, in certain cases, 
may be easier to use. 


2. Certain quadratic forms. In order to lead up to such a theorem as that 
suggested at the close of the last section, we first consider two theorems regard¬ 
ing real symmetric matrices. 

Theorem I. Let A if Ai } ■ • -, A, be s real symmetric matrices, each of order N, 
such that A\ +■ At + • • * A* A, * I, where I is the unit matrix of order N. Let 
r v , v = 1, 2, •.. , s, be respectively the ranks of the matrices A„. Ifi\ + rs + «• - 
-f- r, — N, each of the non-zero roots of the characteristic equations 2 of the matrices 
Ap tS -j-I* 

If s = 2, the theorem is almost self-evident, For the characteristic equation 
of A 2 is 1 Ai — XI | = 0, which, since Ai -f- A 2 ™ I, can be written \ I — A\ — 


2 By the characteristic equation of the square matrix A is meant the algebraic equation 
of degree N in X. 1A — XI | - 0. If A is real and symmetric and the rank of A is r, the 
characteristic equation has exactly r real non-zero roots and N — r zero roots. Cf, Kowa- 
lewski, Einllihrung in die Determinanten-Theorie (1906) pp. 126-128. 



50 


AXLE'S T. CRAIG 


XJ|«Oor|Ai - (1 - X)I | = 0* But the last equation is the characteristic 
equation of Ai with X replaced by 1 - X. Thus the roots of the equation 
| a x _ \l | =; 0 are one minus the roots of | Aa — XI | — 0. Since the equation 
| — >11 s= 0 has N — n zero roots, the equation | Ai — Xl | = 0 has N — r s 

roots equal to -hi* But i*i = N “ r^ so that all the non-zero roots of | Ai —• 
XI | - O.are +1. A similar statement holds for the roots of | Ai ~ XI | *=* 0, 
In general, we have A x -j- At -{- • • > -j- A* = I and r x -j- f 2 d - • • * ”b ~ N. 

Let Bi = Ai + A 3 -f > ■ • 4 * and denote by R x the rank of B x , Thus 4 
B] < r 2 + r 8 + * * * 4- r,. Now Ai -b Bi = I and the equation | Ai - XI1 “ 0 
has exactly N - n zero roots, Since the roots of ] Bi — XI | = 0 are one minus 
the roots of | A x - Xl | = 0, the first of these two equations has at least N - n 
non-zero roots so that Ri > N — n = r% -J- rs + * * * From r<t + ra *f * • ■ 

+ r« < Rx < r 2 + r 3 + ■ • ■ + r, we deduce the equality so that the argument 
in the case of s *= 2 applies to the matrices Ai and B \. In particular, then, 
each of the non-zero roots of j Ai — XI j = 0 is +L By writing I ?2 = Ai -b 
A s + • • - + At , Bi = Ai + A 2 + Ai + * • • *fi,, and so on, and repeating 
the argument in each instance, we see that the theorem holds, 

Theorem II. Let Ai, A 2 , • • • , A t bo s real symmetric matrices which satisfy 
the conditions of Theorem I. There then exist s - 1 real orthogonal matrices of 
order N, say L \, L %, * * * , £»-i, such that each of the s matrices 

* • • LiA v jLi • • • L t ~i , v — 1, 2 , • * ■ , St 

is a diagonal matrix* with the r v non-zero elements on the principal diagonal equal 
to 4 -1, Necessarily, the sum of these s matrices is the identity matrix, 

In proof of the theorem we shall, to save space, restrict ourselves to the case 
of $ = 3, although the method we use will be readily seen to be entirely general. 
Since Ai is real and symmetric and since, by Theorem I, the non-zero roots of 
the characteristic equation of Ai are -j-1, there exists a real orthogonal matrix of 
order N , say L\ , such that 


1 

0 .• 

*0 

0 *■ 

• 0 

a 

• 

1. 

« 

* 0 

• 

« 

4 


■ 

0 

• 

a. 

1 

*• 1 

.1 



0. 

+ 

■ iii 

t • 1 1 

0.. 
1 

0 

+ 


* • » * 

111* 

■ l . 

..0 


where L\ is the conjugate of L x and where, merely as a convenience of notation, 
we have placed the r\ non-vanishing elements of the principal diagonal in the 
first n rows and columns. If then, in both members of the equation Ai + A 2 + 
As = I, we multiply on the left by L[ and on the right by L x , we have 


s Cf. Bflcher, Introduction to Higher Algebra ( 1921 ) p. 62 . 

4 By a diagonal matrix we mean a- matrix whose elements not on the principal diagonal 
are zero, 








INDEPENDENCE OF CERTAIN ESTIMATES OF VARIANCE 


51 


1 0-0 i 0. • - • 0 


i * * * 

0 0 . ■ ■ 1 1 o. .. 0 

(4) . ; + L\AiL\ + LyAzLy — 7, 

0 . 0 • • > 0 

• « r 

1 • * : 

i * i 

0 0 0 

since L[lLy = ILyLy — I. The matrices LyA%Ly and L[A 3 Ly are real, sym¬ 


metric, and the ranks are Ti and r 3 , since Ly is non-singular. Moreover, the 
non-zero roots of the characteristic equations of the two matrices are +1; for 
| L[ A a Lt — \I | = \ L[(A 2 - M)Li | = | L[ || Ai - X7 || Ly \ , and similarly 
for the matrix LyAzLi . Now if a real symmetric matrix is positive definite, 
that is, if all.the non-zero roots of its* characteristic equation are positive, then* 
all the elements on the principal diagonal are positive or zero, and, if an element 
on the principal diagonal is zero, all the elements in the row and column in which 
that element lies are zero. These two facts regarding a real symmetric positive 
definite matrix, in conjunction with equation (4), require that the matrices 
LyAtLy and L'yAiLy be of the forms 


o . 

• 

•• 0; 
• ’ 

o . 

* - • 0 

0. 

oj 

0 

• 

■ 0 

0 ■■ 
* 

ft 

.. 0 
* 

« 

&rHl.r,+l " 
• 

’ i- brj+l-tf 

0 .. 

•• 0 

• 

frtf.ri+l 



0 .. . 

• 

0 

• 

o. 

t 

.. o 

■ 

o .. * 

0 

ft 

0 * * * • - 

* • 0 

and . 

. 



o ... 

* 

0 

ft 

c >‘l+lr*’»+l * • 

* 

1 

■ 

» 

» 

o .. ■ 

« 

0 

Ctf,ri+1 ’ • • 

a 

• Ctfit 


respectively. Now the real symmetric matrix 




















52 


ALLEN CRAIG 


to simplify the notation, we have placed the Vi non-vanishing elements of 
the principal diagonal in the first t% rows and columns. Consider the orthogonal 
matrix of order N 



1 

0 

... 0i 

o. 

.•0 


0 

1 

... 0 



u = 

A 

0 

0 

... 1 

0 ...... 

.. 0 


0 

• 

. f 

1 1 1 

... 0 
• 

ff 

tf*ri+t.ri+l ’' ’ 

+ 

Wrj+l r j V 
* 

* 


* 

0 

I i 1 

1 

i < i Q 

* 

Ww,n+i ■ * * * 

i 

* Itttttf 


It is evident that L' 2 (L[AiLi)Ls = L[A\L V . If then, both members of L[A t Li + 
L[AzLi + L[Aitn - I are multiplied on the left by L[ and on the right by L if 


we get 


1 D • ■ • 0 


0 


1 


0 • • • 0 


0 * ■» 0 


0 .Mill 0 0***0 

< * 

• 4 

o.0 

0***0 


+ 


0 * * • 0 


+ 


0 ••• 0:0.0 * * * 0 


0 *•* Oi 


0 * • * 011 0**»0 
i * * 

I * • j 

1 * ’ i 

10 .***•*. II 


0 0 


0 


0 


0 


0 • • » 0 dri+l,H+l * • * d ri+ i,y 


- L 


0***0 


0 



H 0 *" 0 djf.n+i? * ■" d»» li 

From this last equation, it follows that d jh * 0 k, d fi = 0, j = n -f 1, 
• * * , ri ■+• ra and d// - 1, j ~ n + ft + 1, • * • , N. The third matrix in the 
left member of preceding equation then takes the form 


0 • * > 0i 0 * * • * * 

« ; 


* 

0 ... o ] 

> 

* 

A 

« 

• 

0 jo... 0 

• i » • 

1 

« 

i 

1 

• i * ■ 

♦ * 0 4 • * 0 

1 

0 

" 

1 0 ... 0 


fr » 

¥ • 

t 1 

o..0 

[ o...... 1 

































INDEPENDENCE OP CERTAIN E8TIMATES OF VARIANCE 


63 


This establishes Theorem II when s = 3. The procedure may be continued 
in a fairly obvious manlier so as to justify the theorem for any finite positive 
integer s, 

With the aid of Theorems I and II, we are now able to state and prove a very 
useful theorem on the independence of certain quadratic forms of normally 
and independently distributed variables. The theorem follows. 

Theorem III. Let Xi, x %, - * , Xx be N independent values of a normally 
distributed variable x and let 6i, • • ■, 0* be s real symmetric quadratic forms in 

9 V 

these N variables, where 2 = £•»*. If n, n, < • * , r t denote respediively the 

i i 

ranks of the quadratic forms, a necessary and sufficient condition that the s forms be 
independent in the probability sense is that n 4~ r 2 -f • • ■ +r s = N, 

Consider the characteristic function of the joint distribution of the s forms as 
given by equation (2). In accordance with Theorem II, we can successively 
introduce new variables by performing real linear transformations with orthog¬ 
onal matrices Li, Ls, * • • , T b _i respectively in such a way that 9 T becomes 


r i ri-i-ra w 1 A 

t = k 2 v) + k £ v) + •' • 4-1 2 v) - ra £ 2/5- 

n+1 ri+-"+r,-i+l 1 


H 


Since each transformation is orthogonal, the absolute value of the Jacobian in 
each instance is unity. Thus the right member of (2) can now be written as the 
product of s sets of integrals, the sets containing ri, r>, • ► • , r* integrals re¬ 
spectively. That is, 


<p(h ¥>i(^) * • • v^(0> 

which is equation (1). Hence the theorem. 

Under the conditions of Theorem III, the characteristic function of the 
distribution of 0„ is found by direct integration to be 

_u 

¥>„(0 *= (1 - 2 ic U ) 2 . 


6 If the variables in a symmetric quadratic form with matrix A are transformed by a 
linear transformation with matrix B, the new form has the matrix B' AB . Gf, B6cher, 
p. 129. It should be remarked that these s — 1 successive orthogonal transformations can 
be combined into a single orthogonal transformation with matrix L « LiL t ■ • • £,_i. For 
if, by means of a linear transformation with matrix L \, we pass from the variables a:i, • • ■ , 
aw to the variables ${, • • ■ , % , in which the old variables are expressed explicitly in terms 
of the new, and thence to variables x[, • • ■ , % by means of a linear transformation with 
matrix L s , the transformation with matrix L\Li will carry us directly from the x'b to the 
This extends to any finite number of transformations. Since theproduct of any two 
orthogonal matrices is an orthogonal matrix (and hence the product of a finite number of 
them), we see that the remark is justified, Cf. BAcher, p, 68 and Kowalewski, p. 181. Note 
that IMoher expresses the new variables explicitly in terms of the old. 



64 


ALLEN T. CRAIG 


Thus> 

fM i», - ^ jf P» W dl - 

‘4i 

so that the variables 0 v /<r a are distributed in accordance with Chi-square dis¬ 
tributions with r„ degrees of freedom. 7 Accordingly, when the conditions of 
Theorem III are satisfied, we may deduce not merely the mutual independence 
of the 0„ but also the nature of their distributions. 

3, Applications to the analysis of variance, In the analysis of variance, 
N = ab independently observed values of a normally distributed variable are 
classified into a rows and b columns in accordance with some relevant scheme: 

i *** f b 
&21 i 3^23 i " ' t 

I t ■ 

I 4 « 

I • • 

Otdl , XflS , • • • , Xol . 

With the notation xj. ,x. k ^x to denote respectively the arithmetic mean of the 
jth row, the fcth column, and the entire set, it is readily seen that 

..v 52 52 i$ik — $f — b 52 (#/• ~ #)* + u 52 (#■* — xf + 52 52 

(5) i i i i ii 

(•fcjft - X,-, x.fc dr &) 

= 0i 4~ 02 4~ 03 

is an identity in the N = ab values of x. It is quite straightforward to exhibit 
each of the three terms in the right member of (5) as a real symmetric quadratic 
form in the N variables Xj ■* and to show that the ranks are fj » a — 1, r% = b — 1, 

ft ■* (« - l)(b - I). By the device of adding fl 4 - -4 (52 52 Y - to 

both members of (5), we have 52 52 &!* = ft + ft + ft + ft.. Moreover, the rank 
of 04 is u — 1. Thus r* 4- ?*2 4*4*3 4- ft - ab = N and, by Theorem III, we see 
that the four quadratic forms are mutually independent. In. particular, ft, ft 
and 0i are independent, and each, measured in units of <r a , is distributed as is 
Chi-square with its appropriate number of degrees of freedom. 

Tub Univehsity or Iowa. 


? By the number of degrees of freedom of a real symftietrio quadratic form, of normally 
and independently distributed variables, we mean the rank of the matrix of the form. 


w 



INDEPENDENCE OP CERTAIN ESTIMATES OP VARIANCE 


55 


REFERENCES 

(1) M. S. Bartlett and J. Wiskart, The distribution of second order moment statistics 

in a normal system. Proceedings of the Cambridge Philosophical Society, 
vol, 28 (1931-32) pp. 465-459. 

The generalized product moment distribution in a normal system. Same journal, 
vol. 29 (1932-33) pp. 260-270. 

(2) W. G. Cochran, The distribution of quadratic forms in a normal system. Proceedings 

of the Cambridge Philosophical Society, vol. 30 (1933-34) pp. 178-191. 

(3) R. A Fisher, Applications of Student’s distribution. Metron, vol. 5 (1925) pp, 

90-104. 

Statistical Methods for Research Workers (1934) pp. 210-272. 

(4) S. S. Wilks, Statistical Inference (1936-37) pp. 38, 44-45. 



VARIANCE OF A GENERAL MATCHING PROBLEM* 

By Joseph A, Giusenwoqd 

Let us match two decks of cards: (A) composed of t distinct groups of s 
identical symbols each, and (B) a target deck composed of ii symbols of the 
first kind, u of the second, etc,, such that 

A 4* A "h ■' * "b ^ ~ n* (I) 

It is not necessary that ail the i's be different from zero, 

(a) Forming the Relative Frequency Table ■ The first part of the paper is 
concerned with forming a 2x2-way table showing the relative frequencies of 

i 

hits and misses of all pairs of cards in the target deck. The notation . indicates 

i j 

a miss at the ith card of the target deck, ^ a hit, ~ j indicates a miss at the 

ith card, with the matching card identical to the jth target card. 

Case L ith and jth target cards the same symbol 


i 

3 

Theoretical freq. 

Weighted freq, 


If 0 then 

- ! 0 

n - f S - 1 

(t _ i)( n — 5 — 1) 

2.1 

0 

1 

S 

(t - 1) s = u — s 

2.2 

1 

0 

n - s 

n - $ 

2.3 

1 

1 

8 " 1 

s - 1 

2.4 


( 2 ) 


Total *= f(n - 1) 


But q occurs in (t - L )/l of the events. Thus we must weight 2,1 and 2.2 


with a factor (t - 1), giving the last column in (2), 
Case II, ith and jth target cards different 



1 • 

3 

Theoretical freq. 

Weighted freq. 


0 = j then 

-0 

71 - s 

n — s 

3.1 

o-J 

1 

5-1 

5-1 

3.2 

O^i 

0 

ft - s - 1 

(n -s - i)(£ - 2) 

3,3 

0 7 ^ j 

1 

5 

5(i - 2) 

3.4 

1 

0 

n - s - 1 

n - * - 1 

3.5 

1 

1 

5 

s 

3.6 


( 3 ) 


Total = t(n - 1) 


* Presented to the American Mathematical Society, September 9, 1937. 
1 Read, ‘then out of n - 1 times 1 . 


56 



VARIANCE OP A GENERAL MATCHING PROBLEM 


57 


But * = j occurs in 1/0 - 1) of all events J, and J occurs in 1 jt of all events 

t ■ 

* + * . Therefore entries 3.3 and 3.4 must be weighted with the factor 0 - 2), 

and then entries 3.1, 3.2, 3,3 and 3.4 must be weighted with the factor (t — 1). 
It is important that the totals of the two parts to be weighted be equal before 
the weighting factors are applied, This gives rise to the last column in table (3), 
Now the number of ways the ith card can be like the jth card of the target 
deck is 2 



The number of ways they can be unequal is 


\ 



(4) 


Since the totals of the last columns of the two tables are equal we weight the 
entries of their last columns with a\ and a 2 , respectively, So, combining 3.1, 
3.3 and 3.2, 3,4 we form oo times (2) + times (3) to give the new table 


i 

0 

0 

1 

1 


j 

0 

1 

0 

1 


Relative frequencies 


(n - s - 1 )(t - l)o!| + [({ - l)(n - s 
(n - s)ai 4- (n — s 

(n — s)cq -f- {n - s 

(s - 1 )<X 1 + 


- 1) 4- U* 
“ l)a 2 

- l)a a 

sett., 


(5) 


Now using the entries from (5) form the 2x2-way table 


Total 


(t - l)(n - 5- l)cti + 
[(< - l)(n - $ - 1) 
+ 1] 

(?l — S)ci!l -j" 

(n — s - l)a 2 

(in — n — t -j - 1) 

(cvi -j" a t) 

{n — a)«i -f 
(n — s. — l)a 2 

(s — l)ai 4’ 

(n - l)(«i 4- a 2 ) 

(tn — vt i *-T* 1) 

(ofi a 2 ) 

(n - l)(«i -f « 2 ) 

t(n 4“ &z) 


* If iv < 2 define - 0. 




68 


JOSEPH A. GREENWOOD 


(b) Obtaining the Correlation , Variance and Maximal Conditions . Sub¬ 
stituting from table (G) into the formulas given by Yule 8 for 6 and the coefficient 
of correlation r, we obtain the average correlation 


r -= 


Of2 


+ (1 — t)ttl 


G) 


la i 


(in — n l)(«i + «s) 


(;)». -. - 


t+i) 


lot + (1 


- °G) 


(7) 


G) 


(in — n — ( -f 1) 


by (4). 

We now give a proof that r is a maximum when if = s (j = 1, . - * , t). From 
(7) it is sufficient to show that under the same conditions a* is a maximum. 

Let if - s -J- df, then 


E‘ t, = 0 by (l). 


( 8 ) 


oa = £' 0* + Su)(& + fh) = fi2 "H 2 by (8). 


1i<V 


li <V 


Assume some 5„ y 0 and 


Add 


'y.j ^ 0. 

t*<v 


v -.t 1.*»•«< 

£ 

K<t? 


(9) 


to both sides of (9). Then 

( Iil^tf \i !,•••,< I,’",* 

£ O ^ 4* £ (10) 

/ U<l» 

or 0 ^ a positive number. This necessarily implies the desired result. 

* Yule, G. XI. An Introduction to the Theory of Statistics, Tendont Griffin and Co., 
1927, pp, 216-217, The table can be symbolized with 

Total 


Total 


the correlation coefficient, 


fll 

a s 


hi 


h 

Ci 

o* 

Ci 


& 83 hi — (ci6j/c») 

He then gives r = 5c»/y / ciCiO l 6i, ’ 




VARIANCE OP A GENERAL MATCHING PROBLEM 


69 


Yule 4 gives an expression for the variance in a situation which includes the 
present problem as a special case, to be 

** - nw[l + r(n - 1)], (11) 

where r is the average correlation between all pairs of variables, Substituting 
our result (7) in (11) with p « l/t gives the desired variance, 

It is interesting to note that when i,-« s, (j = 1, .., , t) r reduces to l/(n - l) 2 
giving 

2 _ n\t - 1 ) _ n 2 
* ~ ~ 1 ) *" ~1 

where cl is the variance of the binomial case, 6 
Duke University, 


4 Op, oit,, p. 286, 

6 Concerning this, special case ace also Bartlett, M. S, Propertm of sufficiency and 
statistical tests. Proc, Royal Soc, A. 1937, CLX, 268-282. 

Olds, E. G. A moment-generating function useful in certain matching problem . Abstract 
No, 428, Bull. Amer. Math, Soo, 1937, XLIII, 779. 



THE URGE-SAMPLE DISTRIBUTION OF THE LIKELIHOOD RATIO 
FOR TESTING COMPOSITE HYPOTHESES 1 

By S, S. Wilks 


By applying the principle of maximum likelihood, J, Ncyman and E. S. 
Pearson 2 have suggested a method for obtaining functions of observations for 
testing what are called composite statistical hypotheses , or simply composite 
hypotheses, The procedure is essentially as follows: A population K is assumed 
in which a variate x {x may be a vector with each component representing a 
variate) has a distribution function f(x } 0j, ■ <W, which depends on the 

parameters (h , 02 * * * h • A simple hypothesis ie one in which the 0's have 
specified values, A set ft of admissible hypotheses is considered which consists 
of a set of simple hypotheses, Geometrically, ft may be represented as a 
region in the ^dimensional space of the 0's, A set <o of simple hypotheses is 
specified by taking all simple hypotheses of the set ft for which 0,- « 0 O <, i = 
m + 1, m + 2, • • ■ h. 

A random sample 0 n of n individuals is considered from K , O n may be 
geometrically represented as a point in an n-dimensional space of the x’b. The 
probability density function associated with O n is 


(1) P = H J(x tt , 0i,02> h) 

a*» 1 

Let Pa(0„) be the least upper bound of P for the simple hypotheses in ft, and 
Pti(On) the least upper bound of P for those in w, Then 


( 2 ) 


X 


P W (f>n) 

Pd(0„) 


is defined as the likelihood ratio for testing the composite hypothesis H that 
On is from a population with a distribution characterized by values of the 0< 
for some simple hypothesis in the set w. When we say that H is true, we shall 
mean that 0„ is from some population of the set just described, In most of the 
cases of any practical importance, P and its first and second derivatives with 
respect to the 0; are continuous functions of the almost everywhere in a certain 
region of the 0-space for almost all possible samples O n . We shall only consider 
the case in which Pa(0„) and P u (O rt ) can be determined from the first and 
second order derivatives with respect to the 0 J s, 


1 Presented to the American Mathmatical Society, March 26, 1937. 
‘Phil, Trans. Boy, Soc, London, Ser, A, Vol. 231, p, 295, 

60 



MAXIMUM LIKELIHOOD FOR TESTING COMPOSITE HYPOTHESES 


61 


A considerable number of currently used statistical functions for making tests 
of significance can be expressed in terras of X ratios, and in many cases involving 
normal distribution theory, the exact sampling distribution of X is known, 
However, it is often useful when dealing with large samples to have an approxi¬ 
mation to the distribution of X. We shall consider such an approximation for 
those cases (which include most of the ones of any practical importance) in 
which optimum estimates of the d } \s exist. That is, we shall assume the existence 
of functions hfai, • • • $*)(maximum likelihood estimates of the 6,) such that’ 
their distribution is 


(3) 


-1 2 

—— ° {,{mt (1 -f </>) dz\ * * • dzh 


(2r) h * 


where = (tfj - Oi)\/n i cy ™ denoting mathematical expecta- 

__ \ dO{d&j f 

tion, and <f> is of order 1/Vn and ||cy|| is positive definite. Denoting (3) by 
Jdzidzt ■ • • dzh, and differentiating J with respect to 0*, we get 

1(J_ %Ll - £{»«, + VnZwV 4 = 1,2,-.A 

2\| Cf/| 00k ti 00 k T V 

Since Cf/ — 0(1) and |c#y | ^ 0, it can be seen from (4) that the values of 0* 
which maximize J differ from h, h = 1, 2, ■ ft, by terms of order I/a/h. 


(4) 


Therefore, the maximum Pn(0„) of J with respect to the Ok is 


c<f 


(1 -f </>'), 


( 2 ttP V * 1 r/> 

where # = 0(l/\/n). 

To get P w (0„), we let 135 0<n, i ~ w -f* 1, m -f 2, • ♦. ft, and note that J can 
be written as 


(5) 

where 


/o = 




(2ir)^ !< 


m , . 

, s «/v 

V ./-1 


\x\ 



(6) Xo ~ £ CiiZiZ h <>i - 0(1/Vw) 

and || c {} || is the inverse of the matrix obtained by deleting the first m rows and 
first m columns from || ci/ Ip 1 and L,<, Li being a linear function of 

flo.m+i • • * 9oh t cm/ is the value of Cu with &i 0 « , i = m *f 1 , m 4 - 2, h, 
that is, when H is true. Taking the maximum P u (0 n ) of expression (6) with 
respect to 0i , 0%, • • • 0 m , we get 


(7) 


83 * OiUVi) 


*For conditiona under which the $’a exist which are distributed according to (3), see 
J, L. Doob, Probability and Statistics, Trans, Amer, Math, Soc. Vol, 36, p. 750-776. 



62 


s. s. wilkb 


Hence, when H is true, we have, from (5) and (7) 

(8) X =* ~pr - e" u !(l + 0(1/Va). 

1 Q({/tj 

Therefore, except for terms'of order 1/Vn, 


(9) - 2 log X - Xo • 

Now, the characteristic function of -2 log X is 

*(() = $,«-“•«») = j j fa... fa 





-i 5 
e 


(1 + 0 (VVft»*. 



It can be shown that on any finite interval 1 1 [ < n, approaches uniformly, 
as n -+ «, the function 

(11) UFfWO - . 

But (11) is the characteristic function of any quantity distributed like x with 
k - m degrees of freedom. 

We can summarize in the 

Theorem: If a population with a variate x is distributed according to the probabil¬ 
ity function fix, i?i, ft • • * Ok), mh that optimum estimates h of the exist which 
are distributed in targe samples according to (3), ikn when the hypothesis H is 
true that 0* *= A(> i - m -f- 1, m -f 2 ,«». k, the distribution of - 2 log X, where X 
is given by (2) is, except for terns of order 1 /y/n t distributed like x with h - m 
degrees of freedom. 


Pbincbton TJmvBRsm, 
Princjjton, N. J, 



ON DIFFERENTIAL OPERATORS DEVELOPED BY O'TOOLE 

By M. Ziaub-Din 


1. O'Toole in his paper ‘Symmetric Functions and Symmetric Functions of 
Symmetric Functions' (Ann. Statist. 2. (1931)102-49], has expressed Monomial 
Symmetric Functions £j*6 2 ?* •»•, in terms of power-sums, s r . 

The Monomial Symmetric Functions can be written in partition notation as 
Cp\%1\ - • •) where k, fa, • * * denote the repetitions of parts. 

To express * • •) as a function of si, O'Toole has developed operators 
d T and D r , connected by the formulae, 1 


(A) 

(B) 


rdf ~ 


rlD r ® 


> _ d 

fellfejl ' •; 

£rl(fr<g ... 

fell fed * • * * 

where kiA + h$ + * «* " r 


fei d" fej *{■ ** * “ fe* 

In this paper it will be shown that these operational relations are easily 
deduced from the operators d r and D r of Hammond, used for expressing Mono¬ 
mial Symmetric Functions as functions of Elementary Symmetric Functions, a r . 

For the sake of distinction I shall use q r and Q r for the operators employed by 
O’Toole and keep d r and D r for Hammond's Operators. 

Macmahon has dealt with Hammond’s operators in his Combinatory Analysis 
Vol. I. Cambridge University Press (1915), where they are defined 2 as 





d , d , d 
fa J ai d^J a *da t v 


4* 



2. It is known* that 

log (1 - M + 'wt - aa« J +••■)*= -(&& + + 


•••)• 


1 O’Toole, Loo. cii., p, 120, 

* Macmahon. Comb. Analysis. I. 27-28. 
1 Ibid., p. 6. 


63 



64 


M. ZIAUD-DIN 


Now operate on the right hand side with d r , and with its equivalent in (1) on 
the left hand side, Equating coefficients of % on both sides, we obtain, 

dfS T *. (—l) r_l r; d r Sk = 0 when r ^ h, 

which yields 

drl= (_l)'-' r A=,(-l r'rq,. ( 2 ). 

The operator Q r exactly behaves like D r • From the formula 

dr — Mr-1 + 2 — Adr-3 + ’ * * + (“l) r ^r “ 

which is in complete correspondence with Newton's recurrence relation, we 
derive 

di *= Di 

di m D\ - 2D 2 (C) 

ds = D\ - 3D! Di + 3D 3 . 


By multinomial theorem 

r fc]. I ! • • * 


using (2) we at once get 

rq, = (-1) 


feilfal • * * 


which is the result (A) obtained by O’Toole. 

From (C), D r follows in terms of d r and thence with (2) Q r can be expressed in 
terms of q r . Using multinomial theorem we arrive at 


r\Q r ~ 


... 

kdh\ ♦ • - 


which is (B), Hence both the results of O'Toole have been deduced. 


3, In his second paper 6 O'Toole defines symmetric functions for more than 
one system of Variates. I call such symmetric functions Hyper Symmetric 
Functions. 

The Hyper operators are developed to express Hyper symmetric functions in 
terms of hyper power-sums, They are defined by O’Toole by the following 
relations, taking into consideration two systems of variates only, 


4 p. 29. 

• Ann. Stat. 3. (1932), 66-63. 




ON DIFFERENTIAL OPERATORS DEVELOPED BY 0 TOOLE 


65 


(A)' 


(BO 


rf _Z (-«*'(*-1)1 


D Pi = 


L* W'Plfll W P2«2 ‘ * ‘ 


falfel r ' * 
where kpi + feps + • * • « p 
“f fyjn 4" 1 ' * = ? 

These relations readily follow from Macmahon's 4 hyper operators and 
G vq> These operators came into existence with the problem of expressing 
hyper symmetric functions in terms of hyper elementary symmetric functions 
and they are connected by the following relations, 

t ( (P"N-l)! fl _ £ (“l) M (fe - 1)1^1 rib 

(0 —?Iil— h "biff" " G ™ Qm 1 ■ 1 




w 


_ El(pi +Jl - 1)1]*'((?)»+?1-')!f ! (-1) W / \ku \t, 

Dtlftl ft!»l '"ww - {hm) {gvm) "" 


p\q\ 


Macmahon' has also shown that 

*.<» = (-D (+H 

from which wo get 

ffm * (-1) >+H 


(p + g — 1)1' 


p!«! 


(p+9 - iyi (3) 

The operator G behaves like D of OToole, Now using (3) we derive from (I) 
the result (A') arrived at by O’Toole without reference to Macmahon, Simi¬ 
larly from II, using (3) (B 1 ) is deduced, 


UniyEubity College, Swansea, Wales. 


4 Macmahon, Comb. Analysis, Vol. II, Cambridge University Press (1916), p. 302. 
1 Macmahon Op. Cit., p. 30i 



GRADUATION BY A TRUNCATED NORMAL 

By Nathan Keyfitz 

Below is a table for finding the constants of a truncated normal by the equa¬ 
tion of moments. Karl Pearson* gives such a table for the case in which the 
data are to be fitted to the "tail” (i.c. less than half) of a normal curve but I do 
not believe that the formulae for a distribution consisting of more than half of a 
normal curve have before been tabulated. 

The table below was calculated primarily for an investigation being carried 
out on the duration of unemployment. The Canadian Census of 1931 reported 
the number of persons losing 1-4,5-8, • * • 49-52 weeks in the course of the year 
June 1st, 1930 to June 1st, 1931, by various classifications, (industry, province, 
age, etc.). 

The tendency to report even numbers of months on the part of the enumerated 
population was evident in the result, and some kind of graduation was necessary 
for an interpretation. After some experiment a part of a normal curve was 
settled upon as the simplest and generally most satisfactory representation. 

It was found that among the classes of workers in which unemployment is 
high the curve is more advanced —be, the mode is at a higher number of weeks,— 
than in the classes where unemployment is low. In many cases, (in most 
groupings of female workers for instance) where unemployment is relatively very 
low the modal point of the uncurtailed normal stands at a negative number of 
weeks,—for these cases the fitting is to a true tail and the tables of the Biometric 
Laboratory were used. 

Details of the results of the investigation will be published shortly in the 
Unemployment Monograph of the Dominion Bureau of Statistics, Meanwhile, 
this table will be of use as the complement of Pearson^ tabulation which is only 
suitable for ft £ ,5708, 

Table for finding the constants of a truncated normal by the equation of moments 



ft 

Aft 

ft 

Aft 

0 

.5708 

-.0180 

1.2533 

-.0562 

.1 

.5528 

-.0183 

1.1971 

- .0543 

.2 

.6345 

- .0188 

1.1428 

-.0526 

.3 

.6157 

-.0190 

1.0902 

-.0606 

.4 

.4967 

- .0193 

1.0396 

**“, 0487 

,5 

.4774 

-.0195 

.0909 

-. 0467 


* Tables for Statisticians and Biometricians, page 25. 

66 



GRADUATION BY A TRUNCATED NORMAL 


67 


X 

h 

Aft 

h 


.6 

.4579 

- .0195 

.9442 

- ,0449 

.7 

.4384 

- .0196 

.8993 

- .0428 

.8 

.4188 

- .0196 

.8566 

-.0409 

.9 

.3002 

-.0194 

.8156 

-.0390 

1.0 

.3798 

- .0192 

.7766 

-.0370 

1.1 

.3606 

- .0189 

.7396 

-.0351 

1.2 

.3417 

-.0185 

.7045 

- .0332 

1.3 

.3232 

- .0180 

.0713 

-.0315 

1.4 

.3052 

- .0175 

.6398 

-.0296 

1.8 

.2877 

- .0170 

.6102 

-.0279 

1.6 

,2707 

- .0163 

.5823 

-.0263 

1.7 

.2544 

- .0166 

.5560 

-.0246 

1.8 

.2388 

- .0148 

.5314 

-.0232 

1.0 

.2240 

-.0141 

.5082 

-.0210 

2.0 

.2099 

-.0134 

.4866 

-.0204 

2.1 

.1965 

-.0126 

.4662 

-.0190 

2.2 

.1839 

-.0118 

.4472 

-.0178 

2.3 

.1721 

-.0110 

.4294 

-.0166 

2.4 

.1611 

- .0103 

.4128 

-.0156 

2.5 

.1508 

- .0096 

.3972 

-.0146 

2.6 

.1412 

-.0089 

, .3826 

-.0137 

2.7 

.1323 

-.0083 

.3689 

-.0128 

2.8 

.1240 

-.0076 

.3561 

-.0120 

2.9 

.1164 

-.0071 

.3441 

-.0113 

3.0 

.1093 


.3328 


3,5 

.0813 


.2856 


4.0 

.06246 


.24999 


4.5 

.049379 


. 222221 


5.0 

.0399997 


,1999999 



Let d » distance of centroid of actual distribution from point of truncation. 

S 2 

Lot 2 * standard deviation of distribution about its mean, Then fa = — * 

a 

Hence corresponding x‘' awl fa may be found. 

Then r - dfa, where c = standard deviation of uneurtailed normal. 

And x = x'ffj where x *= origin of uneurtailed normal, 

N.B, The point of truncation is taken for the origin in the original distribution. 


Social Analysis Branch, Dominion Bureau op Statistics, 

Ottawa, Canada 



REPORT OP THE ANNUAL MEETING OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 

The annual meeting of the Institute of Mathematical Statistics was held on 
Wednesday and Thursday, December 29-30,1937, in Indianapolis, Indiana, in 
conjunction with the meetings of the American Mathematical Society and 
associated organizations, 

The Wednesday morning session was devoted to applications of statistics to 
industry and engineering, On Thursday morning, the Institute held a joint 
session with the Mathematical Society for the presentation of voluntary papers 
on probability and statistics. This session was immediately followed by 
another of the Institute for two invited addresses. These addresses were 
“The theory of general means" by Professor E. L. Dodd, and “On the inde¬ 
pendence of certain estimates of variance” by Professor A. T. Craig, Professor 
P. R. Rider was in charge of arranging the program. 

On Thursday noon, there was a luncheon at the Marott Hotel for members of 
the Institute and their guests. After the luncheon. Professor H. L, Rietz spoke 
on “The future of the Institute in relation to mathematical statistics.” 

At the business meeting, which followed the Wednesday morning session, 
President Shewhart announced that these officers had been elected for 1938: 
President, B, H. Camp, Wesleyan University; Vice-Presidents, P, R. Rider, 
Washington University, and 8. S. Wilks, Princeton University; Secretary- 
Treasurer, A. T. Craig, University of Iowa, The Institute voted to hold its 
1938 meeting with the American Statistical Association. The meeting will be 
in Detroit, Michigan, in December of this year, 

Allen T, Chaig, Secretary, 




Tbe Official Journal of the Institute 
Of Mathematical Statistics 


Contents 

Tests of Statistical Hypotheses which are Unbiased in the Limit. 

J. Neyman...•... .69 

The Transformation of Statistics to Simplify their Distribution. ' . 

Harold Hotelling and Lester R, Frankel. 87 

On Combined Expansions of Products of Symmetric Power Sums 
and of Sums of Symmetric Power Products with Applications . 
to Sampling (Continued). Paul S. Dwyer..... ^ .... \ 07 

Distributions of Sums of Squares of Rank Differences for Small 
Numbers of.Individuals. E. 0. Olds •« .;• 133 

Note on Correlations. D. DeLury i...,;..... .149 


t 



Vol..;IX, Na. t Junej 193 $ : 






TESTS OF STATISTICAL HYPOTHESES WHICH ARE UNBIASED IN 

THE LIMIT 

By J. Neyman 

1. Introduction* The idea of unbiased tests of statistical hypotheses has 
been put forward and discussed in two recent papers, 1 * * 4 * Recently also a particular 
problem was solved introducing a test which lias the property of being unbiased 
in the limit. 8 The purpose of the present note is to discuss this conception in 
its general form and to indicate methods of determining the tests unbiased in 
the limit of a broad class of simple statistical hypotheses. The notation and 
the terminology employed below are explained in the papers quoted. 

2. Notation and definitions. Consider a set of n random variables 

(1) 

the particular values of which 

( 2 ) xi , , ■ •• 

can be given by observation and denote by ft the set of hypotheses concerning 
the probability law of (1) which are regarded as admissible, We shall assume 
that all the hypotheses included in ft specify the probability law of the Ps 
having the same analytical form but differing among them in the value of just 
one parameter, 0, Thus, if E n denotes the point (the "event point") in the 
space W n of ft dimensions with its coordinates equal to the values of (1) and w n 
any region in W n , then the probability of E n falling within v> n , as determined 
by any of the hypotheses forming the set ft will be denoted by 

(3) P{E*tv>n \B\ 

and will be a function of the parameter 9, The probability (3) with fixed 9 
considered as a function of varying w n is called the integral probability law of the 
X’s. Frequently (3) is equal to the integral of a certain non-negative function of 
E n over the region w n > This function will always be denoted by p(F„ | 9) and 
called the elementary probability law of (1). 


1 J, Neyman and E, S, Pearson: Contributions to the Theory of Testing Statistical 
Hypotheses, Part L Stat. Res. Memoirs, Vol, 1, (1936) pp, 1-37. Part II, ibid,, Vol. II 

(1938). 

J. Neyman: Sur la verification des hypotheses statiatihues composes. Bull. Soc, Math, 
do Franco, Vol. 63 (1935), pp. 246-260. 

4 J. Neyman: "Smooth” Test for Goodness of Fit. Skandinavisk AktUarietidskrift, 

(1037), pp. 149-199. 


69 



70 


J. NEYMAN 


Denote by Ha some particular hypothesis of the set ft and by 0a the value 
that it ascribes to the parameter 0. 

A test of the statistical hypothesis Ho consists in a rule of rejecting Ho when¬ 
ever £?„ falls within a specified region w n and in not doing so in other cases. 
The region w n used for this purpose is called the critical region. It follows 
that to choose a test means to choose a critical region. 

We shall consider below only cases such that for any region w» the probability 

(3) considered as a function of 6 possesses two successive derivatives* 

Definition 1 . If a critical region 8> n hoc the property that, a being a fixed 

positive number. 

(4) (a) ?jH ft e u)„|0oi = a 


(5) 

( 8 ) 


(b) lp(E ne ffi n |9) 


9-9 


= 0 


to <■.!#) 




*£p|*.**|#) 


where w n is any region satisfying (a) and (b), then the region w n is called the un¬ 
biased critical region of type A corresponding to the level of significance a, and the 
test of the hypothesis Ho based on w n t the unbiased lest of type A, 

This is the definition given in the first of the earlier papers quoted. Now we 
shall define the test which is unbiased in the limit. For this purpose we shall 
have to consider the situation where n is indefinitely increased and consequently 
we have a sequence of probability laws (3), a sequence of spaces W n where they 
are defined and a sequence of regions , each w n being a part of the cor¬ 
responding W n • 

We must aleo introduce a varying scale with which to measure the differences 
0 — 0o - This is due to the fact that, if the choice of the sequence of regions 
w n is not very unlucky and 0 ^ 0o, then we shall frequently have 

(7) lim P{E n € n> n ] 0] == 1 

Comparing this with condition (4), we see that in general the limit of 

P{25 n €$„ | 0j 

for n —* <» will be discontinuous atfl — 0 O . To avoid this we shall measure 
0 — 0o in terms of n 1 introducing instfcad of 0 a new parameter connected 
with the former by means of the equality 

$) 0 = 0o + — 

yn 

For the hypothesis tested Ho we shall have # = 0 and d 9 ^ 0 for any other 
hypothesis in ft, The new parameter # thus introduced will be called the 



UNBIASED TESTS OF STATISTICAL HYPOTHESES 


71 


standardized error in Ho . It will be frequently convenient to use 6 but occa¬ 
sionally we shall use # as well, for example writing P[E n ew n | #) instead of 
(3) etc,, and it is necessary to remember the connection (8) existing between 6 
and d. It may be useful to notice at once that df/dO — -\/n df/dd, 

Definition 2, We shall say that the seguence of regions 

( 9 ) W] , Wi , ‘ , U) n , • • • 

determines a test of the hypothesis Ho which is unbiased in the limit and comsponds 
{in the limit) to the level of significance a, if for any n 

(10) (d) 

where w n is any region such that 

(11) Pf£, S W„| l > =0) =Plj?nfi5„|5 = 0) 
and 

( 12 ) = 

and if 

(13) (e) lim P\E n e w n | # = 0) = « 

(14) (f) lim £ P{E„ e | <?|< j = 0 

(lu 

The practical application of the test determined by the sequence of regions (7) 
consists in observing as large a number n of the X’s of (1) and in rejecting the 
hypothesis Ho whenever E A falls within w n . If n is sufficiently large, then this 
rule will have about the same advantages as the application of the unbiased test 
of type A, In fact, allowing for the circumstance that the values of (11) and 
(12) will be only approximately equal to the limits (13) and (14), the properties 
of the test satisfying the Definition 2 will be as follows: If the hypothesis tested 
be true, it will be wrongly rejected with a relative frequency approximately 
equal to a fixed in advance, If Ho is false and the true value say of & is not 
very different from zero, then the frequency of rejecting Ho will be greater than 
a and could not be increased by applying some other similar test. 

It may be useful to notice that in general there may be more than one test 
of the same hypothesis which is unbiased in the limit and corresponds to a 
fixed level of significance. Consequently there is a possibility of choosing 
between such tests, but it seems to the author that such a choice would require a 
previous strengthening of the theorem of S. Bernstein on which the present 
work is based. 





72 


S, NEYMAN 


3, Theorem of S, Bernstein, In the following, we shall have to use the 
following particular case of a theorem clue to S, Bernstein. 3 Denote by £(a) 
the mathematical expectation of any variate x and by 


(15) 


X lf X 2 , 


two unlimited sequences of random variables. 

We shall assume that 

(1) Xi is independent of Xj and Y / for any i j. 

(2) The following mathematical expectations exist and are independent of i: 

&(X,) = a S(F,.) ~ b 

— a) 5 = <f\ &(Yi — b) t — trs 
(l6) Sp, - a)(Yi - !>)| = mffi 

fi(| X, - a I s ) = ju S(| Y, - b l 3 ) = v 


Consider now the space of 2 n dimensions Wn and denote by E n a point in it 
as determined by the values of Xi, Y »for i « l, 2, - * * u considered as its co¬ 
ordinates. Let u n and v n denote the sums 


(17) 



£ r ( 


and denote by D„ the point on a plane S with its orthogonal coordinates equal 
to u n and v n . If s is any region in £ then let P(D n e s j be the probability of Z)« 
falling within s, 

Theorem or S, Bernstein. If the variate s (15) sofis/y tAe conditions (1) 
and (2) $en, /nr any e > 0, there exists a number N t , such that the inequality • 
n > N ( implies 


(18) 


P{D n es) - 


1 


2itrnrm \/1 — r 2 


l /(u-na) 3 u~nfl i’-nb 


__f / (u-na)» , 

a l"n(X—r a j V *} 


* r a\ n J dudv 


< 


whatever the region & in S may be. 


4. Tests unbiased in the Limit, We shall consider the problem of determining 
the tests satisfying Definition 2, in the case where the following hypotheses are 
fulfilled. 


3 S. Bernstein: Sur un th6or6me limite du calcul dee probability. Math. Ann., Bd. 97 
(192ft) p. 44. . 

See also V. Romanovskij, Bull, dc I'Acndftmie dcs Sciences de l'TJ. R. 8 . S., 1929, p, 209 
atid W. Kossakiewce, Ann. Soc, Polonaise Math., t. XIII (1934), pp. 24-43. 



UNBIASED TESTS OF STATISTICAL HYPOTHESES 


73 


(i) All the random variables (1) are mutually independent and each of them 
follows the same elementary probability law which we shall denote by p{xi \ 0). 

(ii) The elementary probability law p(a’f|0) admits three differentiations 
and two consecutive differentiations, with respect to 0 under the integral taken 
over any fixed finite or infinite interval, so that 


(19) 

If pfe| 

f b d k 

0) dXi = I ~^k Pto 10) dxi 

for h = 1, 2. 



(iii) If 



(20) 

d log p(xi 10) 
w = ~09- 

fl~9 0 W ! 


then we shall 

assume the existence of the following integrals all taken from 

— 00 to -j- a> 


■ 

(21) 


r 

ViVfai 1 0o )dxi 

(22) 

r ‘ m j 

^ ('f'i + c-lf p(xi 1 0o) dx, 

(23) 

YG 1 <V2 — J 

f <pi^ip(x { | 0o) dx 

(24) 

-J 

( <p] \p{xi 1 0o) dXi 

(26) 

j 

f | + cr? | 8 p(Xi | 0o) dXi 


Proposition I. If the above conditions (i), (ii) and (iii) are satisfied, 
being a function of x, and j r j < 1, then the sequence of regions including all 
the points of W n where p(E n 10 O ) ~ 0 and also those of the remaining ones which 
satisfy the inequality 

(26) X) d" V’i') Si — r 2 ) — n<r\ + r— X) <p\ 

i=l V-i / <Tl <-1 


where the coefficient M is to be found from the equation 




74 


J. neyman 


defines a test oj the hypothesis Ho, which is unbiased in ihe limit and corresponds 
(in the limit ) to the level of significance a. 

Remark, The calculation of M satisfying the equation (27) is, of course, 
laborious* But a table of values of M corresponding to varying values of N 
is being constructed by N. L. Johnson at the Department of Statistics, Univer¬ 
sity College, London, and it is hoped that it will soon be published. 

To prove Proposition I, we must first prove (a) that whatever n } the region 
V) n determined by the inequality (26) satisfies the condition (d) in the definition 
2. The proof is based on the following Lemma. 4 

Lemma. If F o, F\, * • • F m are functions of z \., * < • x n integrable over any 
region in W n and a region in W» such that within wn 


m 

(29) ' nzZ o. Fi 

i“*l 

I 

while outside of Wo 

m 

(30) ft < E ctiFt 

»'-l 

<h, at' • * a m being some constant coefficients, then, whatever may be any other 
region i« in W*, such that 


(31) f • • * / Ftdv i • • • dz n - / * * • I F{dxi . •. dx n , for i *» 1,2, ,.. m , 
JJw J Jw M> 


we shall have 
(32) 


/’"I. 


Fo etai ... d# 




Fo C&L • • ■ dz n . 


Proof of Proposition L Denote, for simplicity, by p(Fl n ) the ele¬ 
mentary probability law of the X's as determined by the hypothesis tested. 
Comparing the statement of the Lemma with the definition' (26) of U3 rt , we 
immediately see that this region has the following property: whatever may be 
any other region w in W n such that 


(33) 


and 


(34) 


j jw . . . dx n aa j . . , J p(E n ) dXi . . . dx n 

= ^/■■■ Lti mvWdx ' 


4 J, Neyman and E, S. Pearson: 


Loc. cit., pp. 10-11. 


9 I ( 



UNBIASED TESTS OF STATISTICAL HYPOTHESES 


75 


we shall have 


(35) 


n J " ' J- fe ^ ^ w ) ) dXl> ' “ dXn 

> i / * * * J (jL 4 - (l] ^ ^ p(E n) dxi - * dx n 


But under the conditions (i) and (ii) 


(36) 

p(K \ <>) = 

(37) 

ap(®.|tf) 

aa 

(38) 

a*p(®.|t» 

dd 2 t)«o 




= "~t=; 

|i?-D V n »-l 

(jE + (S fi) ^ p{3t\) 


and it is easily seen that the relations (33), (34) and (35) are identical with 
(11), (12) and (10) respectively and that therefore the region w n satisfies the 
condition (d) of definition 2, It remains to prove that satisfies also the 
conditions (e) and (f), that is to say that, for n —> ». the formulas in the right 
hand sides of (33) and (34) tend to the prescribed values a and zero respectively. 
This conclusion concerning (33) is a consequence of the theorem of S. Bernstein, 
quoted above. To see this, write 

it n 

(39) Uu = <Pi } Vfl 5=7 

i 

and denote by sq the region in the plane $ of («, v) defined by the inequality 


(40) v 4* it 2 > Mffis/ ?i(l — r 2 ) — nffl -f ?’ — u 

fTl 


obtained from (26) by means of (39), The right hand side of (33) represents 
the probability determined by the hypothesis tested of the X’a satisfying the 
inequality (26). But this is satisfied simultaneously with the variates u n and v n 
satisfying (40), Therefore, if we denote by D n the point in S with its coordinates 
equal to (39), then the right hand side of (33) may be interpreted as the proba¬ 
bility P(I>n € so} of D n falling within s Q . Comparing (21)-(25) with (16), it is 
easily seen that, according to the Theorem of S. Bernstein, whatever may be 
e > 0, if n is sufficiently large, then 


(41) 

where 


PID.es,] - f f G n 


0 , = 


dv 


< 6 


2irW(ri era 's/ 1 


?•- 




f” S 0 


f+ntf, (H-ntf,) 5 ) 


12 




'I 


(42) 



76 


J. NEYMAN 


In fact, to what is given explicitly, we must only add that as 

(43) I pM&)dXi^l 

the derivative with respect to 6 of the left hand side must be identically equal to 
zero, Therefore 

( 44 ) ~ J p(x i \d)dx i \e^ <> ~ 

where again the integrals are taken from - «o to + ». It follows further that 
the second derivative with respect to 6 of (43) must be again identically equal to 
zero. Therefore, keeping in mind the definitions of and , we may write 

(45) ^ J Pfa 1 6 ^ Xi = j (**' + wM** I d x i “ 0 

and thus 

(46) — — cti 

The proof that the right hand side of (24) tends to a with n —* « will be 
completed if we manage to reduce the integral of (42) over the region s 0 to the 
integral (27), This is easily done by substituting 


J ipip{Xi 1 6 a ) (hi = = 0 


(47) 

U 

% ” r 
ciyn 

(48) 

_ v -f Uff] “ T<nu/ai 

<T2Vtt(l ~ H 


Thus, if the coefficient M in (26) and (40) satisfies the condition (27), then the 
value of the integral of G n in (41) is permanently equal to a and this means 
that the right hand side of (33) tends to a as n —> qo, 

Denote by p n (u, v ) the elementary probability law of u» and v n , It will be 
noticed that, whatever a in S 


(49) 


PlA.cs 



p n (u, v) du dv 


and that consequently in the course of the above discussion we have proved 
that, whatever e > 0, there exists a sufficiently large number N t such that 
u > N, implies 



II 


(pn{u } v) —' G n ) du dv 


< * 



UNBIASED TESTS OF STATISTICAL HYPOTHESES 


77 


whatever may be the region s in S. We shall now use this circumstance to 
prove that, when n —> , the right hand side of (34) tends to zero. It will be 

noticed first that 

(51) [ • • • / (L <Pi) k v(E n )d%i • • • dx n - / / 

j Jw n J J» 0 

for A: = 1, 2. Further 

(52) /l w 2 p n (ti, y) ~ J J u z p n (u } v) dudv — n<j\ 

Using the inequality of Schwartz, 5 we may write 

^^ ^(p*(^i a) Gt\)dudx 

< ~^(^j J e I P*( w * «) - (7 tt | dudv J j' | p»(u, t») - (?„ 

Now, it is easy to calculate that 

(54) J J u 2 \ p n (u, v) - t? n | dudv < 2na\ 

On the other hand, if n is so large that (50) holds good for any region 5 in $ and 
s+ and s- denote the two parts of So where p n («, v) — G„ is respectively positive 
and negative, then 


(53) 


(55) 


0 < / / | pn(« t «) - G n \dudv = / / (p n (u, v) - G n )dudv 
J J *o J 


f 


w) - G n )dudv < 2c 


and it follows that, for such large values of n, 


(56) 


j j' u(p n (u, v) — G n )dudv < 2ai\/e 


On the other hand, using the transformation (47) and (48), we find that 

(57) —~= f [ uG n dudv=z -4= [ \xe~ ixt -4= f e~^dy\dx 
cri-ynJ J»o -\/2ir J-” l v2?r ■/*-*** ) 


and consequently is permanently equal to zero, As e is an arbitrarily small 
number, it follows that 

(58) lim 4r^ [ f wp„(u, v)dudv = lim ~ [ f 2 (pip{E n )d% i * • • dx n = 0 
n->» Vtt J Js 0 VW J Juin i-1 

which fulfills the proof of Proposition I, 

* See for example’ Kacsimn and H> Stein.ha.ua, Theorie der Qrthogonakeiheu, 
Warsaw, 193fi, p. 10, 



78 


J. NEYMAN 


Proposition II. If the conditions of Proposition I are satisfied but either 
j f | „ i or $ { is independent of x< } then the test of the hypothesis Ho which is un¬ 
biased in the limit and which corresponds, to the level of significance a, is determined 
by the sequence of critical regions w n , defined by the inequality 


(59) 



> Mi "\/n 


n 


c~ h1 dx = 1 - a 


wfeere X sa&s/tes the equation 

<*» dsL 

Proof. We notice first that the condition | r | = I and the equation (44) 
imply 


(61) 


(^l ipi(% + I ft)) ^ 


= j <p\p(Xi 1 Of) dxi j + a\ do) dXi 0 


or 




J tPii&t ”f* f Of) dxi J (f&% "h ci) pi^Xi | Of) dx f 
j <PiP(%i \ Of) d$i j <pi($i -f cri)p(rt-i | Of) d$i 


= A 0 


and therefore 


(03) j {('J',- <riy — A<pi(fffi -j- trj)}p(a:f | <?o) dxi — 0 

(0^) j {p»(Sq + * 1 ) - Atfr)p{%i \ Of) dxi = 0 

and finally 

(05) J fa + A ~ A<pifp(Xi \ Of) dxi = 0 

which means that at almost every value of Xi for which p(x{ | Of) ^ 0, 

(00) -f- ff \' — Aw 

It follows that the inequality (10) in the definition 2 of the test which is 
unbiased in the limit reduces to the following 

If"' f. (H < PiYp(^lOf)dxi dx„ 

(07) 1 U • ' 

~ ^ / * ’ ’ jw <f> y^ En I .to) dxi ’ * * dx » 



UNBIASED TESTS OP STATISTICAL HYPOTHESES 


79 


owing to (11), (12), (37) and (38). On the other hand, the inequality (59) 
is equivalent to 

(68) > a 4- b <pi 

with a = X 2 vi?i and b — 0. Referring to the Lemma, we conclude that the 
regions w n satisfy the condition (d) of the Definition 2. It remains to show 
that they satisfy also the conditions (e) and (f). This immediately follows 
from the theorem of Liapounoff 8 and the reasoning which we used above in 
order to prove (58). 

If does not depend on Xi then, owing to (38) and (11), the inequality (10) ■ 
immediately reduces to (67) and the proof of Proposition II follows exactly 
the same lines as before. 

* 

5. Limiting power function. To know the properties of a test undoubtedly 
means to know (i) how frequently this particular test will reject the hypothesis 
tested when it is in fact true and (ii) how frequently will it detect its falsehood 
when it is wrong. The information of this kind is provided by the properties 
of the so called power function of the test, This has been defined 7 as follows. 
Letfwn be any critical region and, as formerly, P\E n tv> n \0\ the probability of 
E n falling within w n as determined by a specified value of 0. If w n is fixed, 
then P {E n e w* | 0 j will be a function of 0 only. To emphasize this circumstance 
we may introduce a new symbol, writing 

(69) F{F rt e w, t [ 0} = 0(0 | w„} 

which will mean that in the above formula w n is kept constant and 0 varied. 
The function;3(0 | w tl ) thus defined is called the power function of the critical 
region w n or that of the test based on w n . If corresponds to the level of 
significance a and 0 O is the value of 0 specified by the hypothesis tested F 0 , then 

(70) 0(0d \w n ) = a 

and it will be noticed that this is the probability of rejecting H 0 when it is in 
fact true. As we reject H 0 only in such cases when E a tw n} the values of 
0(0 | w„) corresponding to other values of 6 8 0 are equal to the probability of 

detecting the falsehood of the hypothesis Ih when 0 has any specified value 
different from 0o. The larger the value of 0(0 | at a given 0, the greater 
will be the "detecting power" of the test, which justifies the name attached to 
the function 0(0 | iu n ). Until the present time the power function of only a few 
tests has been studied and it follows that we know comparatively little of the 
properties of the tests even if they are in frequent use. The first study of this 
kind was concerned with the power function of the “Student’s" test as applied 
to the problem of one sample and there are three publications giving various 


9 See for example Paul L6vy: Th6orie de l’addition des variables aUatoires. Pari$, 
1837, Pp. 101-107. 

1 J. Neyman and E. S, Pearson: loc. cit., p. 9. 



J. NEYMAN 


SO 

numerical tables* However, in these publications the term “power function’ 1 
does not appear yet. Apart from the joint paper already referred to where 
the term “power function” was first defined, we may mention a few papers in 
Biomdrika, the most important of which seems to be that by S. S. Wilks and 
Catherine M. Thompson.® The purpose of studying the power function of any 
test is to be able to answer the following three questions: 

(a) What should be the size of a sample in order to have a reasonable chance 
of detecting the falsehood of the hypothesis tested, when the error in the pa¬ 
rameters that it specifies has some stated value? 

' (b) If in some particular case a test failed to reject the hypothesis tested 
(which, of course, does not mean that it is necessarily true), is it likely that the 
error in 0o does not, exceed some specified limit A? 

(c) Two different tests corresponding to the same level of significance am 
suggested for the same hypothesis Ha , which shall we use? 

In this last case the answer is obvious—the one which gives the greater 
chance of detecting the falsehood of the hypothesis tested in cases when it is 
wrong. But to know this we must know the power functions of both tests. 

For the above reasons it seems to be important to study the power function 
of the teat unbiased in the limit aa defined above. It is obvious that, as in this 
case the elementary probability laws are not specified, it is impossible to find 
the actual explicit formula giving the power function. Therefore we shall 
endeavour to find its limiting form. This will be done by means of the two 
following theorems, 

Consider an infinite sequence of situations 

(71) Si, St, • * • S m f < * • 

In each of these situations we shall have to test the same hypothesis Ha con¬ 
cerning the probability law p(x | 8) and specifying the value 0 o of 6. The situa¬ 
tions differ among themselves by the number of the X'b and by the hypotheses, 
alternative to Ha , which are considered. For the situation S « we shall denote 
them by n m and H& respectively, We shall assume that lim n m ~ + <x> when 
m ^ 00 • As to the hypothesis H m , we shall assume that the value 8 m which it 
ascribes to the parameter 0 is 

(72) 8 m = e 0 + -1= 


4 (1) S, Kolottaiejczyk: Sur Perrcur de la acconde categoric dans le probldmo de ”Stu- 
dent..” 0, R, Academic des Sciences, Paris, 1.197 (1933) p, 814. 

(2) J. Neym&n with co-operation of K, Iwaszkiewicz and S. Kolodziejczyk: Statistical 

Problems in Agricultural Experimentation. Suppl. Journ. Roy, Stat. Soc. Vol. II (1935) 
pp. 107-180. f 

(3) J. Neyman and B, Tokarska: Errors of Second Kind in Testing "Student's” Hy¬ 
pothesis. J. A. S, A,, Vol, 31 (1936) pp. 320-334. 

* S. S. Wilks and Catherine M. Thompson: The Sampling Distribution of the Criterion 
\n , when the Hypothesis Tested is not true. Biometrika, Vol, XXIX (1937) 1 , pp. 124-132, 



UNBIASED TESTS OF STATISTICAL HYP0THE8ES 


81 


where #, the standardized error in 8 a , is kept constant. We shall assume that 
in each situation we test the hypothesis H o by means of the test unbiased in 
tlie limit and corresponding to the level of significance ct. The power function 
of this test should be denoted by (3(0 \ W nm ), but to simplify the notation we will 
write simply )3 ni (0). We shall be concerned with the value of this function 
j3m(0m) at the point 0 — 0 W and we shall prove the following proposition. 

Proposition III. If the third logarithmic derivative of p(®< 10) with 
respect to 0 is hounded 


(73) 


d 3 log p(xj 1 8) 


< Q = constant, 


and | r | < 1, then 



Lim/3 rt (0j = 



\(x-d<ri)3 


1 


r 

JM-Nx* 


e 


dx 


This proposition is analogous to that 1 ® concerning the “smooth” test for 
goodness of fit. It could be used in the following manner. 

When testing the hypothesis Ha and using for the purpose a certain number n 
of observations, we find ourselves in a situation which might be considered as 
one of the sequence (71). If n is large, we may hope that the right hand side of 
(74) will give a reasonable approximation to the actual value of the power 
function corresponding to the value of ■ d to be calculated from (72) by sub¬ 
stituting in it n m ~ n. 

Proof. Denote 


(75) 


aUogjfejs) _ Xi(9) 


We may write 

(76) pfc|e„) = 

where flj denotes some value intermediate between 8a and 6 m « Consequently, 
taking into account (39), (47) and (48), we have 

il jj|. 

(77) p(B.. | flj - II pfe I O = P«.„ I <00 + < „)c' s ' 1 -' ,, 'l 

where 

l + xr) + *’ 

* Urn 


10 J. Neyman: “Smooth’' Test for Goodness of Fit. Skandinavisk Aktuarietidskrift, 
(1937), p. 186. 




82 


J. NEYMAN 


It is seen that, if tn «> then e m tends to zero, uniformly in every bounded 
region of the plane, S, of x and y. Denote by s any bounded region in 8 and by 
Wm{s) a region in TP nm of which $ is a transformation by means of the formulae 
(39), (47) anti (48). The probability of #„ M falling within F*(s) is equal to 
that of the point with coordinates x and y falling within s. The former of these 
probabilities is represented by the integral of (77) over TF m (s) and the latter 
by the integral taken over s of the elementary probability law y | $ m ) of 
x and y, corresponding to the value 8 m of 0. Owing to the formula (77) we may 
write 

(79) ' P„fe V K) = P.fe V I fc)(l 4- 

where, owing to (78), i) m tends uniformly to iero in s as m -+ ». Remembering 
the connection between u H , v„ and x, y and also the inequality (56), which is 
valid for sufficiently large values of n, we conclude that 





where Q nI has the property that, whatever be e > 0, for sufficiently large values 
of m 


(81) 



Qm d>$ dy 


< 6 


where s is any bounded region in It follows that 


(82) vJt, y 1ft.) - | + + Vm) 


and that therefore, whatever be the bounded region & 

(83) lim f ( p„{$, y 16 m ) dx dy = ~ f f fT il< *~*' l>t+ ** l dxdy 

m-** J J* At j y 3 

It is known however, that whenever an integral probability law tends to a 
fixed limit uniformly within any bounded region, then it must do so within the 
whole space, It follows therefore that the formula (83) is valid for any region s 
whether bounded or not. But 


(84) 

and it follows that 

(85) dim MO 


$n(K) - f f p„(x, y | d lft ) dx dy 

J J]i> 




l r* 

■\Z2tt J-* 


[e*-*'"' i r 4 dx, 

l V2rr >-*»* V \ ’ 


which completes the proof of Proposition III. 



unbiased tests of statistical hypotheses 


83 


It is important to be clear about the exact meaning of the Proposition III, 
Suppose for example that in a particular case $ - ~ 1 and consider a sequence 

of situations in which 

Tii = 100, «3 = 100®, . ■ • n,„ & 100 w , * • • 

(86) 

— do 4* * 1> ^2 = Oo -j- *01, • ■ ■ 6 m =* da 4" (■ l) m i * • • 

If this were the ca.se, then the Proposition III would be applicable and we 
could affirm that the sequence of the power functions p m (0), each considered 
at the appropriate point 9 m , has a limit, represented by the double integral in 
the right hand side of (85) with doi = 1. Accordingly, if we were interested 
in the value of the power function at 6' ~ 0 O *f ’02 with n = 10000 and 9 X — 1, 
then we could hope to obtain its approximate value calculating the double 
integral in (85) with 

(87) d - (O' - faWn = 2 

These are legitimate conclusions, However, it would be wrong to consider as 
proved that, if in the same example we increase the size of n to n' = 40000, 
then the value of the power function at 9 = 9’ will be represented by its limit 
(85) with d = 4 and with about the same accuracy as previously. It is just 
possible that to attain the same accuracy at d = 4 a value of n greater than 
ft' will be needed. This of course would imply a corresponding change in 6', 
Proposition IV, If the conditions of Proposition III are satisfied hut 
either \ r \ = 1 or ^ is independent aj a*, then 

(88) lim 0 m (O = 1- dx 

m-Mw V27T 

The proof of this proposition is quite analogous to that of Proposition III, 
6. Examples. 

Example 1, Consider the case where it is known for certain that 

(89) for — w < i < « 

but where the actual value of 9 is doubtful and it is desired to test the hypothesis 
Ha that .9 — d 0 — 0, the alternative possibilities being both 9 < 0 and 0 < 9. 
Before applying the test unbiased in the limit it is natural to try the unbiased 
test of type A. The critical region Wq of this test is defined by the inequality 

(00) £ + (2 ^ > a + b W 

, i" 1 V’-l / i-l 

where the constants a and b must be found so as to satisfy the conditions 


(91) 

(92) 


Pi$n 10o) dX\ .* 


] • ■ • f Hi/ wpUMfo) daji 

J JviQ i m l 


dx n - a 
dx n - 0 



84 


J, neyman 


The technical difficulties involved in this problem are considerable and this 
may induce us to apply the test unbiased in the limit. Following the above 
theory we have 


(93) 


<Pi = 


1 + 4 


(94) 

(95) 


44 _ _ 

(1 + 4) 2 ~ 1 + 4 

16(^ - Of _ 12 

U ) (1 + (Xi - W)* " (1 + {Xi - 0) 2 ) 3 


It is easily seen that all the limiting conditions of the theory are fulfilled and 
that, in particular j x<(0) | cannot exceed a fixed limit, approximately equal to 3. 
We have further 


A 2 

(96) «W) = -m = - L *. = i - ^ 

Similarly 

(97) &{$\- + 4) 2 | = I - A 

(98) HviVi) - 0 « r 


It follows that the regions w n , the sequence of which determines the test 
which is unbiased in the limit, are defined by the inequality 


(99) 4 2 


*' 


i (i + *l) ! 


a +MS 




1=1 


1 + 4 


,-„l l ;Cf> 




where M should be calculated so as to satisfy (27) with 

» 

(100) n = 

In order to test the hypothesis 77 q we have therefore to observe the values 
t$t., • * ■ x n and to substitute them into the left hand side of (99), If the 
inequality is satisfied then the hypothesis should be rejected. 

Approximate values of the power function could be obtained from the right 
hand side of (85) with 

(101) 9n = t J\. 

Example II, Let us assume as given that 

V(x i '\6) = 6e~* t ' for 0 < 

= 0 elsewhere 


(102) 



UNBIASED TESTS OP STATISTICAL HYPOTHESES 


85 


with 8 > 0, the hypothesis to test being that 8 — 0 O = 1, with the alternatives 
both 8 < 1 and 8 > 1. 

In this particular example the unbiased test of type A is easily found 11 and 
moreover 12 it has also the property of being of type Ai, But this circumstance 
does not diminish the illustrative character of the example. We have 

(103) *= 1 — X{ 


(104) 


~ — 1 = constant 


It follows that the regions forming the test which is unbiased in the limit are 
determined by the inequality (59), We have further 


(105) = £M) = / 

and the inequality (59) reduces to 


(1 - %\" x dx ~ 1 


(106) 



(1 - xd 


> 'k's/n 


with X taken from the tables of the normal integral according to (60) and to the 
chosen value a. Approximate values of the power function can be calculated 
from, say 

(107) DM = 1 - 4 = f l to 

V2ir 

with 

(108) d = (8 - l)Vn 


The simplicity of the example considered permits to calculate the exact 
power function of the test and it may be interesting to obtain its limit A*,(d) in 
another and a more direct way. Write 

n 

(109) £ -Xi = V 


It is known that, if the probability law of each of the X’s is given by (102) 
then the probability law of y is 


(lio) 


p{y\8) = 


*Y~ l -*y 

(n-l)l 


for 0 < y 


— 0 otherwise 


11 J.Neyman and E, S. Pearson, loc, cit, p. 18 et aeq. 

18 J. Neymau; Estimation statiatique traitSe eomme un prabl&no de probability class- 
tqUe. Series Actuality scientiftques et industrieUes. Paris, (1038), (In the press.) 



86 


J. NEYMAN 


It follows that the exact form of the power function corresponding to the test 
(106) is 

an f'n±\\/n 

an) r ' e ^ dy 


For values of n about 100 or more and for the values of 0 close to unity the 
distribution of say 


( 112 ) 


__ 9 2 *f — n _ fy — n 


Vn 


Vn 


is practically normal with mean equal to zero and S-D. equal to unity. It 
follows that the integral in the right hand side of (111) is practically equal to the 
normal integral taken within the limits which are obtained by substituting in 
(112) the limits of y in (U'l). After some easy transformation we have, with a 
considerable accuracy 


(113) 


j3(0jO ^ 1 - 


1 

\Z2tt 


]{e~ i) 



dz 


or, after some further transformations and taking into account (108) 

-HA(l+r?/\/r^) 


(114) 


1 /^M 

J5(»l».) - 1 - - 7 = / , 
V^TT y~x( 




du 


and it is seen that, when t? is fixed and n indefinitely increases, then (3(0 | w n ) 
does tend to ^(t>). 


University College, London, 



THE TRANSFORMATION OF STATISTICS TO SIMPLIFY THEIR 

DISTRIBUTION* 


By Harold Hotelling and Lester R, Franxel 

1. Introduction. The custom of regarding a result as significant if it exceeds 
two or three times its standard error has now given way among informed statis¬ 
ticians to a consideration of the exact probabilities associated with the distri¬ 
bution of the statistic in question. For example, in such problems as that of 
examining the significance of the difference between the means of two samples, 
particularly small samples, it is no longer adequate to regard the difference of 
means, divided by the sample estimate of its standard error, as normally dis¬ 
tributed. The significance of this ratio, “Student 1 ratio/’ is judged instead by 
the value of 

(1) 

where n is the number of degrees of freedom entering into the estimate of 
variance, and 

i r ( n 4 i ) i 

(2) - 7y= 7~T j~ inra ■ 

1/15 <!) K) 

If the probability law underlying the observations themselves is normal, and 
they are independent, P is the exact probability of the value of t obtained being 
equalled or exceeded on the hypothesis that there is no real difference between 
the means. 

Methods of approximating P have been studied by R. A, Fisher 1 and by 
W. A. Hendricks, 2 and tables have been presented by Student 3 and Fisher. 4 
Nevertheless, the practical statistician will very frequently wish to make 
judgments of significance without stopping to consult a table, or laboriously to 
compute P, and will tend to revert to the former inaccurate blit convenient 
practice of treating t as normally distributed with unit variance. The essential 

* Presented at the joint meeting at Indianapolis of the American Mathematical Society 
and the Institute of Mathematical Statistics, December 30th, 1937. 

1 Expansion of Student's Integral in Powers of n~K Metron, vol. 6 (1925). 

5 Annals of Mathematical Statistics, vol. 7 (1936), pp. 210-221, 

3 New Tables for Testing ike Significance of Observations. Metron, vol, 5 (1925). 

4 StafisricoJ Methods for Research Workers, Oliver and Boyd, 1925-1936. Tables IV and 
VL 


87 



88 


HAROLD HOTELUKG AUD' LE&TEtt R. FRANKEL 


reason for this is that the normal distribution to winch that of t approximates 
for large values of n has only one parameter in the expression for the probability. 
Hence it is easy to remember a few important values, such as those correspond" 
ing to P = .01 and .05; and when values of P representing other levels of sig¬ 
nificance are in question, the single-entry tables of the normal probability 
integral are more easily available and easier to use than the double-entry table of 
Student’s integral, Indeed, t is a more useful statistic than Student’s original 
ratio of mean to sample standard deviation, to which it is in the simplest case 
proportional, partly because of the close approximation of t for large samples to a 
normally distributed variate of unit variance. 

For more complicated statistics the practical need for something simpler 
than the exact distribution is even more urgent, on account of the larger number 
of parameters involved in the distributions. For example the large class of 
problems giving rise to probabilities expressible as incomplete beta functions 
require for exactitude the use of Pearson's extensive triple-entry table, 5 and 
even this is inadequate for some ranges of the parameters. The shorter tables of 
If. A. Fisher 6 and of Snedecor 7 are helpful, but are also necessarily of triple entry. 

It is a common practice, for example, among economists and psychologists, to 
select either by graphic methods or by preliminary calculation that one, out of 
many tests that might be applied to available data, for which P is the least. 
Such selection evidently introduces a bias, which is the more subtle because the 
tests giving high and therefore insignificant probabilities are likely to bo for¬ 
gotten. Often the only way to guard against such fallacies is to insist on a 
value of P lower than is easily determined from tables. Thus, if h independent 
tests of significance have been made, and only the smallest value P is reported, 
its significance should be judged not by this value P itself, but by the probability 

P' « l_(x _ pf « fcP - 

of the least value being so small. If we equate P' to some such standard value 
as .01, then P must, for this standard level of confidence, take only a fraction, 
approximately 1/k, of this value. Such a small probability will often fall 
outside the range of existing tables. 

Instead of relying on tables or direct computation from the exact distribution 
of a statistic, it will sometimes be desirable to use a modification of the statistic, 
selected so as to have the normal or some other standard distribution. We 
shall consider a type of transformation of a statistic such that the distribution 
becomes the limiting form of the original distribution as the sample size increases. 
Thus our transformation will reduce to the application to the statistic of a cor¬ 
rection which will be small when the sample is large. We shall show how to 
make simple approximate corrections of this character for two cases. 


1 Tables of the Incomplete Beta Function, Biometrika Office, 1934. 

1 Loc. cit, Tables IV and VI, 

7 fiokvJafion and Interpretation of Analysis of Variance and Covariance, Ames. Iowa. 
Collegiate Press, 1934. . 



DISTRIBUTION OF STATISTICS 


89 


The first of these is the Student ratio t, the lower limit of the integral in (1), 


Putting 



(3) 

4>(z) = 

1 

e 2 

\/2v 

and 



(4) 

F = 2 

f* 

1 <f>(z) dz 


Jx 


which in view of (1) and the fact that the integral of each distribution from ~ « 
to » is unity is equivalent to 

(5) dz = jf dz 


we shall show that x has an asymptotic expansion: 


x ~ t 

( 6 ) 


i + 1 131* + 8t 2 + 3 35Z fi + 19t 4 + ? - 15 

4n + 96ft 2 384 ft 3 

6271/ + 3224Z 6 - 102Z 4 - 1680Z 4 - 945 
+ 92 * 160 ^ 



It will frequently be a sufficient approximation to treat 



as normally distributed. These appear to be approximations of practical 
value when n>t\ 

The second statistic whose transformation to a function having its limiting 
distribution we shall consider is the generalized Student ratio T, appropriate 
to all the uses to which i can be put, but with a multiplicity of variates instead 
of one to serve as the basis of the test of significance. 8 This is defined with 
reference to variates aii, ■.. x p , together with a linear function of sample values 
(proportional for example to the difference between the means in two samples), 
such that if is the value of this function of the sample values of Xi (i « I, * • ■ , 
p) then the variance of in the population sampled is the same as that of a*, 
and on the hypothesis to be tested, the population mean of each is zero. 
In terms of unbiased quadratic estimates s i5 of the covariances among 
%i, - • • , x p , each based on ft degrees of freedom, we may define l </ as the cofactor 
of s,-/ divided by the determinant of the statistics s</. Then T is defined by 

(7) t - 2SZ<,^/ 


8 Harold Hotelling, The Generalization of Student's Ratio . Annals of Mathematical 
Statistics, vol. 2 (1931), 



90 


HAROLD HOTELLING AND LESTER R, PRANKEL 


the summations running independently with respect to i and j fioin l to p, 
For independent samples from a multivariate normal population, the distribution 
of T has been shown 9 to be 



As n increases, the distribution of T approaches the x distribution with p degrees 
of freedom: 

(9) r M 


By equating the probabilities derived from these two distributions, we shall 
define x as a function of T, and obtain asymptotic expansions for the functions 
X and x 2 thus defined. 

Since the probability associated with T is expressible in terms of the incomplete 
beta function, or the analysis of variance distribution integral, it follows that 
any of the many common statistics, of which simple functions have this distri¬ 
bution, can be expressed simply in terms of T. Tests of significance in a wide 
variety of cases may therefore be made with the help of the asymptotic expan¬ 
sion corresponding to f 2 , together with a table of x s . 

A further advantage of the transformation of a statistic into a normally 
distributed variate of unit variance and zero mean is that further statistical 

i 

tests are possible with such variates. Since a great part of statistical theory is 
based on the assumption of such normal distributions, an extensive field of 
applications becomes available in this way. For example, if several independent 
tests give values of t based on various numbers of degrees of freedom, and it is 
desired to combine these tests so as to get a single probability, the corresponding 
values of the normally distributed variate * defined above may be squared and 
added. The sum will then have the x 2 distribution, with a number of degrees of 
freedom equal to the number of values of t used. In a similar manner, the 
values of x corresponding to a number of independently determined values of 
T 2 may be added, and the sum will have the % distribution with a number of 
degrees of freedom equal to the sum of the various values of p involved. 

The advantages of this type of what may be called “normalization” of a 
statistic have been brought out by R. A. Fisher for the particular case of the 

correlation coefficient. His use 10 of % — \ log facilitates such operations 

1 — v 

as the averaging of values obtained from independent samples, or taking the 

9 Harold Hotelling, loo. cit, 

10 Statistical Methods for Research Workers, Sec. 36. 



DISTRIBUTION OP STATISTICS 


91 


difference between two values, with the testing of significance of the result in 
each case. This is because z, unlike r, has a nearly normal distribution, with 
variance nearly independent of the population value. We note in passing that 
this function is the same as tanh~Y, and may therefore he determined accurately 
and readily from the Smithsonian Institution Tables of Hyperbolic and Ex¬ 
ponential Functions. 

2. Normalization of t. The r 'duplication formula” in the theory of the 
Gamma function 11 shows that 

/n + A 

Substituting this in (2) and taking logarithms we have: 


log <f> n {z) - log n — {n — 1) log 2 + log T(n) 



The last logarithm may be expanded in a series of powers of ijn which not only 
converges uniformly on the interval 0 < z < t when n > f 2 , but has the property 
of being a uniformly asymptotic representation of the function on this interval. 
This means that the sum of the first j terms of the series (j = 0,1 , 2, <. ■ ) differs 
from the function represented, by a quantity whose produet by W' H has, for 
sufficiently large values of n, an upper bound independent of z, so long as z 
remains in this interval. Uniformly asymptotic series have a number of 
important properties, among which is 12 term by term integrability with respect 
to z, In this sense we have the uniform asymptotic representation: 



We shall obviously have another uniform asymptotic representation if we add to 
this, term by term, asymptotic series with terms independent of 2 , such as those 
for the gamma function logarithms in (10). Since 13 

(12) log r(w) ~ \ bg 2 ir -f- (n — 1) log n - n + fj 2r(2r - l)n^ 
where 

B\ ~ Bz « yu } Bz — Bi = 3 ^, Bs = jfa, - ■ • 


11 Whittaker and Watson, Modern Analysis, 4th ed., p, 240. 

■* H. Schmidt, BeilrUge zu eine Theorie der oUgeminm a&ynpiotisckm Daratdlungen . 
Math. Annalen, vol. 113 (1937), pp, 029-666, The property mentioned above is proved in 
Schmidt's Theorem 6. 

Whittaker and Watson, loc. cit., pp, 252, 125. 



92 


HAROLD HOTELLING AND LESTER R, FRANK EL 


are the Bernoulli numbers, we obtain upon substituting in (10) this find the 
similar formula for log T ^, together with (11), and some simplification, 



\ log 2 jt 


z* -1~ 2z* + z* 

2 + 4n 

3z 4 - 2« 8 1 - 4g° + 3s a 5s 8 ~ 4 2 10 

+ l2n 2 ^ 24n 3 ^ 40n 4 i " 


Upon differentiating (5) we obtain: 

(14) *(*) | = 


Since ^ is simply the normal distribution function (3), this may be written: 
(16) log 2 tt - | + log — « log <k(0 


We .shall always in this paper use the symbol “lim” to mean the limit as n 
approaches infinity, The functions of n and z , or of n and t, which we shah 
denote by R, R\ R", with or without subscripts, are to be such that the absolute 
value of each has an upper bound independent of n, z and t so long as n > 1, 
and z and t are confined to some fixed finite interval. 

From (13) we have that Urn log $„(&) « log <{>(*), 
whence, by the continuity of the exponential function, 


lira <M&) = <#>(2) 

This holds uniformly for 0 < 2 < i Subtracting jf §(z) dz from both sides of 
(5) we therefore find that 


(16) 


^ <!>(z)dz » jf (^( 2 ) - <£( 2 )} dz 


can by choosing n large enough be made as small as we please. Since <j>(z) > 0, 
it follows that the function s of t and n is such that 


(17) 


lim x = L 


A parallel argument, proving slightly more than (17), is the following. From 
(13), 


W 

log 4> n (z) « log </.(«)+ — 

?! 

where R' is a bounded function of the kind described above. Therefore 




/. . R"\ 



DISTRIBUTION OF STATISTICS 


93 


Substituting this in (16) wc have that 

i: 


• p{z)R" dz 


0(z) dz «= - f 
ft J o 

From the mean value theorem of integral calculus it then follows that 
(18) 


x = t -f- ^ 

n 


An asymptotic series may be substituted in a power series, and the result is a 
valid asymptotic representation of the corresponding function. (Schmidt, 
loc. cit., Theorem 4.) This justifies taking the exponential of each side of (13) 
and arranging in a series of powers of ft” 1 to give 


(19) 


*.(*) ~ ^ 

ft n e 


+ 


This asymptotic development will, like the original one, hold uniformly in 
every finite interval, and may therefore be integrated term by term. . Thus 


( 20 ) 




«*- I. + 


+ 


, <*iWl 

' -tJ I 


7V 


dz + 


Bn 


w 


J-H 


where | has an upper bound independent of n and t when ft > 1, and t is 
confined to a finite interval, 0 < t < T, Substituting this in (16) we obtain; 


(21) j’ *(z) dz = f‘ + ... + 




Rj+i 

n i+x 




In terms of a sequence of functions/] ,/ 2 , ... of t to be defined below, let 

fi 


« _ I 4. /l _L 4. 


+ 


n 1 


f*i 

Now I 4>{z) dz can be expanded in a series of powers of n which converges for 

sufficiently large values of n; for the Taylor series 

(23) 0(z) = 0(0 + (z - t) 0'(O -|- 

can be integrated to give a series of powers of $/ ~ i } which by (22) is a poly¬ 
nomial in ft” 1 . As a matter of fact wq have from (22) that %j - t can be made 
arbitrarily small by taking n largo enough; consequently the series (23) and that 
obtained by integration in this way will converge uniformly and absolutely. 
We thus have: 


( 24 ) 


j t 0(z) dz - -/, 0 dr (fi <t> -f ^ fi 


+ ^5 (fs<t> +/l/20' + 


8 ^') 


+ 



94 


HAROLD HOTELLING AND LESTER R. FRANK EL 


Now let us define /i ,/*•••, by equating the coefficient of each power of n 
in (24) to that of the same power of n in the right member of (21). This process 
gives a sequence of equations 


(25) 


/i0 85 jf $(z)cii(z)dz 
f $(z)cti(z) dt 
"h fifttf + ~ f <p(z)ctt(z) d# 


4* (/i/a + "f \fth$ u 4* ^ ^( 2 )«4( 3 ) dz 


Since <p ?* 0 the first of these equations defines /i for every value of t; when ft 
has been determined, the second equation defines /a; then the third defines f 3 , 
and so forth, It is to be observed that the functions/] ,/«,♦*• thus determined 
are not changed when the value of j appearing in (22) is increased; we have a 
unique sequence. 

If for the right-hand member of (16) we substitute that of (13), replacing 
z by t, and on the left of (15) put 


55 

da; 




_s= 1 J-41 -L £* -L 

dt ^ n r r 


and then expand in a formal manner in powers of if 1 , we shall upon equating 
coefficients of like powers of n obtain a sequence of differential equations 

fl ~ ifi = K-i “ & + <*) 


(26) ft - if* - \ft + ifi + ~ $t B 


These, with the initial condition fi =fi= • * • = 0 for t •* 0 determine the same 
sequence of functions as before, The equations (26) are in fact obtainable 
simply by differentiating (25) and cancelling out the factor That this 
must be true follow from the equivalence of the various formal processes of 
manipulating series of powers of n~ l , whether convergent or divergent, to give 
equivalent results. The differential equations are easily solved; the solutions, 
at least for /i ,/ s ,/ 3 , and ft , are all polynomials. Why they should come out as 
polynomials is not immediately obvious; but their calculation is made easier if 
each/,- is replaced in the differential equations by a polynomial of degree 2 j -j-1 
with undetermined coefficients, involving only odd powers of f, The fs of lower 
order are replaced by values previously determined, and the coefficients are 




DISTRIBUTION OP STATISTICS 


95 


found by equating like powers of L This process supplies at each stage more 
equations than unknown coefficients; their consistency verifies the assumption 
that// is a polynomial of the kind specified, at least for j < 4. These poly¬ 
nomials arc the coefficients of the powers of ri "'in (6). 

The series on the right of (24) not only converges but is an asymptotic series 
uniformly valid when t varies in any finite interval, Hence upon subtracting 
(24) from (21) and taking account of (25) wc find that 



jp-R 


where | | is uniformly bounded. Upon applying the mean value theorem 

to the integral on the left we find that x differs from x,-, and thus from the first j 
terms (22) of the series (6), by a quantity whose product by n i+l remains bounded 
when n approaches infinity. This proves the validity of the asymptotic ex¬ 
pansion. 


3, Accuracy of the Approximation. To follow through the above processes 
in such a way as to obtain useful limits for the error involved in using the first 
few terms of the series (6) in place of x would bo excessively difficult. However, 
the magnitude of the error in taking the first two or three terms as an approxima¬ 
tion to x may bo judged from the tables below to be adequately small for practi¬ 
cal purposes, provided n > i\ The essential singularity of the normal distri¬ 
bution at infinity, in contrast with the algebraic nature of the Student dis¬ 
tribution, means a poorer approximation of one to the other as t increases while n 
remains fixed, though a better approximation as ft increases. This is illustrated 
in the following tables, where it will bo observed that the approximations are 
better for large than for small values of n, and of 

- 2 jf 4> n (z)dz 

It will be seen that for n = 10 and P < .001, the utility of the asymptotic 
series, or at least of its first five terms, is vitiated by the rapid oscillation of 
conseeutive terms, due to the high values of ? in relation to ft, 


= 2 P m 


da 



p ro 

.10 

P tea 

.06 

P ** 

.01 



n p* 30 

n a 10 

n » 30 

ft w 10 

W a 30 

t 

1.812 


2,228 

2,042 

3,169 

2.750 

$1 

1.618 

1 

1.896 

1.954 

2.294 

2,654 

Xi 

1.650 


1.080 

1.960 

2.764 

2.679 

Xi 

1.643 




2.446 

2.675 

X 

1.645 

1.960 

2.676 









HAROI.D HOTELLING AND LE0TEK n. FHANKEL 


96 



! P- .001 

P = .0001 


n * 10 n » 30 

n “ 10 n ** 30 n = 100 

t 

4.587 3.646 

6.22 4,482 4.052 

Si 

2.069 3.212 

.05 3.69 3.88 

Xi 

4,981 3.313 

12.86 3.98 3.89 

H 

0,896 3,283 

-20,44 3.85 3.89 

$4 

7.163 3.293 

75.66 3.91 3.89 

X 

3.291 

3.891 


I 


4, Transformation of the Generalized Student Ratio. The arguments and 
methods of calculation set forth in Section 2 may be applied with little or no 
change to the transformation of various other statistics in such a way that the 
limiting distribution for large samples is reached at onco for the transformed 
statistic, In particular, to deal with the generalised Student ratio T } we may 
equate (8) to (9), represent x as an asymptotic expansion with undetermined 
coefficients which are functions of T, and then by substituting and equating 
like powers of ff 1 obtain as before a sequence of differential equations for 
determining the coefficients. This process gives 

(27 ) ^ .. 

[U x T 96n* 

This reduces to the expansion of x in terms of t previously found if we put p - 1, 
It is somewhat more convenient in practice to use x and T\ to avoid extract¬ 
ing the square root of the latter expression, and to utilize the existing tables 
of X' Ordinarily therefore we should not use (27), but the series 

. J + T ! (4 - p ! ) + {2 + 5p)T 2 + 8^ \ 

x J .r 2n + 24n 5 

which may be obtained in the same way, or by squaring (27) in a formal manner. 
That these are genuine asymptotic approximations follows by essentially the 
same argument as before. 

Columbia University and Washington, D. C. 





ON COMBINED EXPANSIONS OF PRODUCTS OF SYMMETRIC POWER 
SUMS AND OF SUMS OF SYMMETRIC POWER PRODUCTS 
WITH APPLICATIONS TO SAMPLING (Continued) 

By Paul S, Dwyuh 

PART II, THE FUNDAMENTALS OF SAMPLING 

Introduction 

We consider a population of N variates in which every individual possesses a 

common attribute. Let the variate xt be the measure of such an attribute for 

AA 

individual i From the N variates it is possible to form ( 1 different samples 

where each sample consists of n variates, n < 

{N\ 

Each sample has its mean, variance, etc, so that there are 1 j means 
variances, etc. The fundamental sampling problem, as interpreted here, is to 
find the relation between the moments of the () means, and the moments of 

W 

the j ) variances in terms of the moments of the moments of the universe. 

W 

Numerous attempts have been made to solve this problem, but each has been 
restricted in some way. It is the aim of Part II to indicate an approach which is 
broad enough to include many of the fundamental variations. 

The first chapter is devoted to a listing of criteria which should be satisfied 
by a theoretical development which is to be considered sufficiently general. 
These criteria might be applied to other statistics but the theory developed 
here is limited to those statistics which are moments (or functions of moments) of 
moments. The first chapter continues with an account of the more significant 
papers which have contributed to a general solution of the problem. No attempt 
is made to indicate a complete history, but rather there is presented a brief 
summary of a number of the most significant contributions, 

The second chapter is devoted to definitions and notation. An attempt has 
been made to use conventional notation whenever it is suitable. 

The third chapter deals with some of the fundamental principles which are 
used in the general approach. It presents a crucial part of the argument 
as it shows how various types of sampling problems can be reduced to Garver 
functions, 

The last three chapters deal with specific applications to some of the simpler 
problems, Chapter IV discusses the case of moments of the mean of the sample. 

07 



98 


PAUL $. PWYEH 


Chapter V considers the mean of the variance and the variance of the variance, 
while Chapter VI gives a large number of formulas, implicitly, in tabular form. 

Chapter I. A Brief History of Previous Contributions 

In order to assist the reader in getting a perspective with reference to previous 
mathematical work on the relations between the moments of the moments of 
the sample and the moments of the complete set of measures (universe), a list of 
criteria 1 is suggested below which might be applied to each contribution. These 
criteria group themselves naturally into two classes, The first eight questions 
can be answered categorically, while the remainder are less definite in nature 
and are not so subject to categorical answers. 

1. The Criteria. 1. Does the method apply to one type of frequency distri¬ 
bution only or is it broad enough in scope to include any distribution law? 

2. Is there any restriction as to the size of the sample? 

3. Is there any restriction as to the size of the universe? 

4. Is there any restriction as to the nature of the correlation between ob¬ 
servations? More specifically, is the method applicable only to some particular 
law of formation of the sample such as "drawing with replacements/' "drawing 
without replacements/’ etc., or is it broad enough in scope to allow application 
to other orderly replacement laws? 

5. Is the application limited to one characteristic (variable) or can a large 
number of characteristics be treated simultaneously? 

6. Is it necessary that the universe maintain the same frequency distribution 
during the formation of the sample or may it assume a different frequency 
distribution before each drawing? 

7. Does the method produce exact, rather than approximate, formulas? 

8. Does the method permit approximations to a required degree of accuracy? 

9. Does the method enable the author to write general laws in a compact 
form? More specifically, can he express, in a form which is not too symbolic, 
any moment of a given sample moment? If not, what order of moments can 
be expressed? 

10. Is the notation such that the general case can be turned into the more 
important special cases with relative ease? 

11. Does the development lead logically to the introduction of new moment 
functions (such as the semi-invariant of Thiele [B'; 209] or the k functions of 
R. A. Fisher {2T, 203}) which are useful in condensing the results? 

12. Is a combinatorial analysis provided so that any given formula, or any 
part of it, can be checked for accuracy without too much effort? 

2. Review of previous results. The articles below have been examined with 
the criteria in mind, No attempt is made to write specific answers to all the 


l Many oi these criteria have been suggested, in less explicit form by Tehouproff (15; 
461-471). The Introduction" of his Matron paper is recommended for use as a supple¬ 
ment to the present chapter. 



COMBINED EXPANSIONS 


99 


criteria in each case, but rather to indicate the important features of each 
contribution. 

The papers discussed by no means cover ail the work on moments of moments, 
although a rather complete bibliographical background is available to the reader 
who desires to examine the bibliographies attached to the articles mentioned. 
Undoubtedly the importance of the articles written in English ha3 been over¬ 
emphasized. Since the important contributions of non-English writers (such as 
Thiele and Tchouproff) have eventually appeared in English, it does no serious 
harm to refer to the English versions even though the results may have been 
partially antedated by the author in some other language. 

A large number of the earlier results on moments of moments were limited to a 
special case of the problem, usually the case in which the universe is infinite 
and normal. The present summary deals with those authors who, during the 
past four decades, have made real contributions to the problem of generalization. 
A detailed account of the history of moments of moments would include many 
valuable contributions which are not included here. 

It seems expedient to start with Pearson's article ''On the Probable Error of 
Frequency Constants" [2] which appeared at the opening of the century. 
Although by no means the first article in the field, it presented a rather complete 
set of formulas for the case of moments of moments. One advantage of these 
formulas is that they are relatively brief and yet this brevity results from the 
fact that they are approximate. The original paper dealt with the univariate 
case, but it was followed by a later one [6] which discussed the case of more than 
one variable. 

These formulas have played an important r61e in that they have assisted in 
making it clear that the moments of moments of samples must be estimated if 
one is to be permitted to draw conclusions from his sampling moments and that 
it is possible to work out formulas which serve as the basis of those estimates. 

Of great importance also was the contribution of T. N. Thiele to the sampling 
problem. Adapting certain ideas of Laplace, he used semi-invariants in which 
to express his results which he published in English in 1903 in "The Theory of 
Observations" [B'; 209]. He took the case of the infinite parent and any law of 
distribution and then worked out moments through the fourth of the variance. 

An earlier contribution of the introductory period was that of Karl Pearson 
in 1899 [1], This paper is significant in that it provides formulas for the four 
moments of the mean when sampling is from a finite universe. The universe is 
not general, but obeys a simple frequency law. 

Another article of this period was that of Robert Henderson (1904) in which 
the first four moments of the mean were given for an infinite universe with any 
frequency law. This article, which was first published in Transactions of the 
Actuarial Society of America [3], was considered so important that it was re¬ 
published in 1907 in the British Journal of the Institute of Actuaries , Henderson 
gave, in addition to the first four moments of the mean, first moments of m , wa, 
7» 4 although the last of these formulas is erroneous. 



100 


PAUL S. DWYER 


Another important contribution of this period was that of * Student’ 1 in 
1908 [5]. He was interested in the properties of the normal distribution, but 
did not assume normality in his general derivation. He took an infinite popula¬ 
tion and wrote the formula for the variance of the variance. In this result he 
inserted the condition for normality. His further argument in the normal case 
implied the development of corresponding formulas for the higher moments of 
the variance, but he did not publish them as they were incidental to his main 
attack. The semi-invariant equivalent of these results had been previously 
given by Thiele [S'; 209-210], 

The real contribution of “Student” to the general problem of moments of 
moments was his method, for it is his method which has been utilized by later 
writers. “Student's” method has the advantage that the development involves 
algebraic processes only. Contributions of Neyman, Church, Pepper, Carver, 
and the present writer are based upon it. 

An important development during the next decade, 1908-1918, was the 
establishment of the first four moments of the mean when the samples were 
drawn from a finite parent without replacement. It appears that a number of 
men worked this problem independently. For example, one might examine the 
results of Pearson [4], Isserlis [7, 8], Mortara [C], Tchonproff [11], and Edge- 
worth [9], Probably the best English presentations of that era were those of 
Isserlis [8] and Edgeworth [9] which appear in the same volume of the Journal 
of the Royal Statistical Society. 

A most prolific writer on sampling during the next decade was the Russian, 
Tchonproff, who had been publishing in Russian and Scandinavian journals 
[10], [11]. His most valuable contributions were published in 1918-1923 in 
Biometrika (in English) and in Metron (in English), 

The first series of articles was published in three different numbers of Bio* 
mtrika in the years 1918-19 [12], Tchouproff assumed an infinite universe 
and used the method of mathematical expectation, At first glance the most 
characteristic aspect of his work appears to be the complicated notation which 
he used. This notation was adopted because he undertook a much more general 
problem than had previously been attempted and hence needed to make new 
distinctions, Although he limited himself to the infinite ease and one variable, 
he worked out the theory with the freedom that the frequency distribution of the 
universe might change between drawings. In the special case in which the 
populations are the same, he worked out the moments of the variance as far as 
the fourth. The chief criticism of his work concerns the complicated notation 
which seems to have been difficult to follow critically. A mistake in one of his 
formulas was not discovered for some years and then not by examination of his 
reasoning, but through the application of his results to an actual problem [17], 

It is perhaps appropriate to insert here that in 1934 Feldman [30] rewrote the 
material of the second Biometrika article by simplifying the notation and extend¬ 
ing the argument to the case of two (and more) variables. 

Tchouproff continued to generalize his work and in the 1923 volume of Melron 



COMBINED expansions 


101 


[15] there appeared a series of articles in which there were no restrictions as to 
the size of the sample, no restrictions as to the type of sampling distribution 
(in fact the sampling distribution might vary between successive drawings), 
and no restrictions as to the law of replacement, or more generally as he expressed 
it, “no restriction as to the nature of the correlation between observations/ 1 
Criterion number 5 is the only one of the first eight criteria which is not satisfied 
in as much as the approach is limited to that of a single variable. Also the 
notation was extremely complicated and, although Tchouproff gave general 
formulas for moments of moments, these 'formulas are so symbolic in form that 
he did not find it expedient to write out specific formulas beyond the variance 
of the variance for such an important special case as sampling from a finite 
parent without replacements. 

During the same period J, Splawa-Neyman [14] had been examining the 
problem of sampling from a finite parent without replacements. He published 
his results in a Polish journal in 1923 [14] and his corrected results two years 
later in Biometrika [18]. He gave the well known formulas for the first four 
moments of the mean and a formula for the variance of the variance. He also 
gave some simple correlation formulas such as the correlation between the mean 
and the variance. 

At this time the basic problem of moments of moments, at least as it was 
interpreted by Pearson and his followers, was the establishment of the first four 
momenta of the given moment of the sample so that a Pearson curve could be 
fitted. A. B, R, Church, a worker in Pearson’s laboratory, was assigned the 
task of seeing how the moments of the variance work out in actual practice. 
In doing this lie became convinced that the formula for the fourth power of the 
variance, which had appeared in Tchouproff’s Biometrika article, was incorrect. 
He tried to follow the argument of Tchouproff, but apparently was baffled by 
the complex notation and finally, at the suggestion of Pearson, decided to carry 
through the formula using the method of “Student." In doing this he dis¬ 
covered a mistake in the Tchouproff formula for the fourth power of the variance. 
At the same time he published [17] the formulas for the third and fourth power 
of the variance in the more conventional notation of that time. 

It might be noted that it is particularly fitting that Church should discover 
this error since Tchouproff, as Pearson himself stated in an editorial [13], had 
pointed out a number of errors made by the Pearsonian school, 

In the next volume of Biometrika there appears an article by Church [19] in 
which, among other tilings, formulas are derived for the third and fourth 
moments of the variance in the case of a finite population, sampling without 
replacement. Church claimed no particular credit for these formulas. His 
point is rather that they are almost valueless from a practical standpoint chiefly 
because of their length. The formula for the fourth power of the variance 
occupies three and one-half of the large pages of Biometrika and is given with 
the apparent aim of indicating, as Pearson said [21; 209], “the practical futility 
of the theoretical formulas." 



102 


PAUL 8. DTVTEH 


Church gave full credit to Neyman for the formula for the variance of the 
variance and made no mention of Tchouproffs Matron work and of the more 
general presentation there given. This was particularly unfortunate because 
it exposed him to the charge that he ignored non-English authors. This charge 
was immediately made by Greenwood and Isserlis (20] who broadened it to 
include Neyman and, by implication, Pearson himself. They advocated the 
case of Tchouproff who, now dead, was unable to defend himself. They gave a 
survey (valuable to the cursive reader) of the pertinent contributions of the 
Tchouproff articles and suggested that the ignoring of Tchouproff was par¬ 
ticularly disconcerting since it appears that Tchouproff had gone more than half 
way in his cooperation with English writers. 

Pearson replied in an interesting article [21] which made it clear that Neyman 
established his results independently of Tchouproff and that the language of 
Neyman is much simpler than the complicated notation of Tchouproff. Pearson 
emphasized that Tchouproff made no attempt to give specific formulas for the 
third and fourth moments of the variance in the case of sampling with replace¬ 
ments. Pearson did not answer, at least explicitly, the claim that the Tchoup¬ 
roff formulas are applicable to a more general case in which there is no restriction 
as to the nature of the correlation between observations. 

The year 1928 was marked by two important contributions. We first mention 
that of G. C, Craig who published his thesis in Matron (22). Extending the 
previous results of Thiele, he was able to write the semi-invariant equivalent 
of the basic formulas in much less space than their previous moment formulation 
had demanded. He was able to write products of sample moments as well as 
moments of the moments themselves, His results are limited to an infinite 
population and one variable, The bibliography attached to his paper is com¬ 
monly mentioned in later literature for its completeness. Tor infinite sampling 
it might properly be used as a supplement to the bibliography of this Tart. 

A most important contribution was made by R. A. Fisher [23] who was able 
to simplify the infinite sampling formulas greatly, He did this by introducing 
the sample function whose expected value is a cumulant (semi-invariant), In 
addition to the simplification, his ingenious attack resulted in the following 
contributions; (1). the recognition of the one to one correspondence between all 
possible independent sampling formulas and the partition of numbers, (2), that 
the extension of the multivariate form is accomplished by use of the partitions 
of multipartite numbers,. (3) the tabulation of numerous new formulas, (4) 
the use of a general partition method by which any term in the formulas can be 
determined separately. 

The further development of the combinatorial analysis was indicated by a 
paper by Fisher and Wishart which appeared in 1931 [27], It was shown how 
the more involved patterns could be broken up into simpler ones. 

The study of the infinite case was continued by Georgescu [28] who extended 
the Craig results. A feature of his work was the utilization of functions which 
yielded expansions of formulas in terms of successive degrees of approximation. 



COMBINED EXPANSIONS 


103 


He applied Fisher’s idea of a combinatory analysis to the conventional sample 
moment function. 

Another paper of this series was that of Wishart [29] who gave a descriptive 
account of the contributions of Craig, Fisher, and Georgescu and an indication 
of the means of expressing the results of one writer into the language of another, 

The work of Joseph Pepper which appeared in Biometrika in 1929 [24] should 
be noted. Pepper took the case of the finite parent, sampling without replace¬ 
ment, and two variables, and then gave an extensive list of results. He did not 
have a very condensed notation and was forced to assume an infinite universe 
for the higher moments which he studied, The important point, for historical 
purposes, is that Pepper combined bivariate and finite sampling. It is to be 
recalled that Tchouproff himself in his generalized theory gave no results for the 
multivariate case, 

A significant advance in finite sampling was indicated by the appearance of 
Carver's editorial on “Fundamentals of the Theory of Sampling,” which ap¬ 
peared in the first volume of the Annals of Mathematical Statistics [25], 
Carver took the case of a finite universe, one variable, and sampling without 
replacements. He presented a notation which enabled him to write the various 
moments of the mean through the eighth in simple form. He showed by a 
number of illustrations that his formula would give known results for cases 
both infinite and finite, when the proper restrictions were added. O’Toole [20] 
later generalized his results for any moment of the mean. 

3. Generalized Carver Functions and Sampling. The use of generalized 

Carver functions together with the results of Part X makes possible the presenta¬ 
tion of the general sampling theory in a compact, and yet not too symbolic, 
form. It is possible to write the sampling theory so that criteria 1-8 are satisfied 
although no attempt is made in the present paper to answer criterion 6. With 
reference to criteria 0-11, any affirmative answer must necessarily be tempered 
with qualifications as the results are far removed from that ideal solution which 
would permit one to determine the actual distribution of any sample moment. 
However the use of generalized Carver functions docs permit a general concise 
statement of results as well as the determination of special cases. The method 
is also especially adapted to the introduction of new moment functions and to 
the use of partition analysis, although these topics are not emphasized in the 
present paper, In general it may be said that the use of Carver functions assists 
greatly in finding the theoretical sample statistics in the ease of finite sampling 
since the Carver functions are condensed expressions of the size of the sample 
and the size of the parent, since they may be easily checked from symmetrical 
considerations, and since they are independent of the moments. They are also 
applicable to different replacement laws. < . 

4. The Use of High Moments. Precise agreement between theoretical and 
practical sampling does not usually accompany the use of high moments, and 



104 


VAX't 8. UWYKU 


the practical statistician is apt In agree *»lb I Vntsi f i» who wrote, "I luivo a very 
firm conviction that tlm niidheimilieiau who uses high moments may make 
interesting contributions to mathematics, bill lie remove* his work from any 
contact with actual statistics'‘ [10; llTj. i [mvwcr siuec < la- extent of agreement 
between theoretical and actual results is in a mv. w* a measure of Urn extent to 
which theoretical assumptions an* ivelunHy dupliraied in tin* experiment, it 
does seem sensible to discover whn( relnlions exist in (hr ideal theoretical case. 
Thiele implicitly supported the theoretical use of high moments (even in studying 
actual problems) wlwu he wrote (IV; VA\’ 

“Therefore fclie general rule of the format ion of good laws of presumptive, 

errors must be: 

1. In determining Xj, and h rely almost entirely upon the actual values. 

2. As to the half-invariants with high indiees, say h upwards, rely os ex¬ 
clusively upon theoretical considerations. 

3 — " 

A more explicit advocate is It A. Fisher who wmie \%S ; 21)0), "In the present 
state of our knowledge nay information, however incomplete, ms to sampling 
distributions is likely to he of frequent use, irrespeelivo of the fact tlmt moment 
functions only provide statistical estimates of high elUvieney for n fipcoi&l type of 
distribution.” 


Chapter II, Notation and Definition 

The present chapter gives the fundamental definitions mid appropriate 
notation. An attempt 1ms been made to combine the most desirable features 
of the different notations of earlier writers. 


6. Ordered Sample. An ordered sample, is a sample iu which distinction is 
made as to the order in which tin* variate enters the growing sample, Thus 
the sample found by drawing x } and then xi is the same sample as that obtained 
by drawing at, and then .r«, but it is a different ordered .sample, 

In some types of sampling it is possible that a given variate may appear more 
than once in the same sample. In general the number of ordered samples 
varies with the number of repeated variates. Thus the .sample *i + Xi results 
from but one ordered sample, while x\ -}- x 3 results from either of two ordered 
samples. 


6, Power Sums. Power sums have the mime moaning as in section 11 cf 
Parti. An adjustment of notation is necessary as we need to distinguish power 
sums of the sample from power mm of Ihe universe, The n-lh power sum of 
the universe is denoted by (4) while the sample power sum is denoted by (a)\ 
Similarly, bold-faced numerals are used to indicate power sums of the universe, 
while light-faced numerals are. used to indicate power sums of the sample. The 
symbol (A) is used to indicate that the variates are deviations from the moan of 
the universe. 



COMBINED EXPANSIONS 


105 


7. Power Product Sums. Power product sums, called power products for 
brevity, also have the same meaning as in section 11 of Part I. large letters 
are used to represent the power products of the universe while small letters are 
used to indicate the power products of the sample. Thus (QiQj • • ■ &) repre¬ 
sents a power product of the universe while (ftys < • - ft) represents the corre¬ 
sponding power product of the sample. Power products are not used extensively 
except in the development of the theory of the next chapter where they play an 
important r61e. 

8. Expected Values. If a given statistical function, z is formed for every 
possible sample, then the arithmetic mean of the z’s is the expected value of z. 

Thus E(z) — where the 2 holds for all possible samples and S is the number 

u 

of such samples. 


9. Moments. Moments demand precise notation since distinction must be 
made between moments of the universe, moments of the sample, moments of 
the moments of the sample, and moments about the mean for these cases. In 
addition we wish to indicate whether or not the universe is measured about its 
mean. 

a. Moments of the universe. The conventional g’s are used to indicate the 
moments of the universe, In this notation ju is used to indicate the moment 
about the mean of the universe, Thus 


„ < 2 > _ £*' 

N 


and 


. _(J3 


E*' 

N ' 


The usual formula relating m and jL t [22; 20] may be written 

((- s ).i') w -" ; 111 

so that 

__ (2)_(1) (1) 

N N 2 f 

(3) _ 3(2) (1) 2(I) 3 
N N 2 r JV 3 1 

etc. 

It is to be noted that, when (1) = 0, g, ~ 
b. Momenta of the sample. We denote the moments of the sample by the 
letter m [23; 203], 

In much statistical work deviations from the mean of the universe are used in 
place of the variates themselves. When the universe moments about the mean 
appear, we indicate them with a bar. However in denoting the moments of the 


h = E (-!)'( 



m 


PAUL 8, DWYER 


samples, the moments of the mean do not appear and some other device is 
needed to indicate whether or not the variates are measured about the mean of 
the universe. The simple notations m t and M t arc used to indicate that the 
variates used are deviations from the mean of the universe, A superprefix is 
used to indicate the case in which the variates are not measured about the 
mean, W, ! w<. The values of m, (and 'fflj) arc obtained from the values of 
nh (and '?»,) by means of the formula 

m = D (-1)" ^ W, -* OT * • 12} 

c. Moments of tl ie ma?neftts of cl sample. Since there are many possible 
samples and since a given moment can be computed for each sample, it is 
possible to express the expected value of this moment and the expected value of 
any power of it. The n% arc used for this purpose. Thus 


Mr(mt) - E{mt) r 


Pr<■»*} - #('w < ) r . 



If the first one of equations {3} represents the whole group, then the values 
g r (m,), pr( Wi), p,r{Mt) t and are indicated by 



d. Moments of the product of the moments of a sample. The term ==—■ can be 

indicated by 15($y) ~ y)> Similarly the expected value of the product of 

m a and nh may be indicated by E(mMh) - Mu(w«, mb). In general 

, «h> 2 > ‘ * • J m Q< ) =5 m r a\ > * < w£) { 5) 


In the case of the product of sample moment functions, when the universe is 
not measured about its mean, it is preferable to use a single superprefix, asso¬ 
ciated with the g instead of a number of them associated with each m function. 
Thus 

glU^TWa , W, l 7n„) = '/uuCma, Wb /mo). 

The usual laws for changing from moments to moments about the mean in the 
case of the multivariate distributions are available, Thus 



combined expansions 


107 


/in (m*, mf) ™ Pu(m a , mb) - nw(vi a , mb)m(m 0 , nib), {6} 

jhuiMa, m h , Pic) = jwni (m a , ?n h , m a ) - ju uo (m a , m b , m c )jum(m a , m b , m e ) 

~ Pm(m a , m b , m 0 ) juoio(w<,, mb , m„) 

- pm(m a , nib, Me) nm(m a , m, «i 0 ) 

+ , mi,, me) (ioia{m a , mb, m c ) pm{m a ,mb t m t ) ( 7 } 

etc, 

10. Different Sampling Laws, For theoretical purposes, any Jaw may be used 
in the formation of samples as long as it results in functions of all possible samples 
which are symmetric functions of the variates, Any uniform law of replacement 
satisfies this condition and hence might be used in forming samples, Most 
statisticians who have worked on the sampling problem have been content to 
assume one or the other of two replacement laws. Each of these is “natural,” 
since it has wide application in the study of actual sampling. 

The two types of sampling which have received general treatment are sampling 
from m infinite universe with any law of replacement and sampling from a finite 
universe with a law of no replacements. The results of the first type are also 
applicable to the case of sampling from a finite universe when replacements are 
made after each drawing. These two types of sampling have been characterized 
by the terms “sampling from an infinite universe," or “sampling from an 
unlimited supply” [25; 114| and “sampling from a finite universe” [17], or 
"sampling from a limited supply” [25; 101], 

The theory of moments of moments for the first type of sampling has been 
developed to a high degree by such authors as Craig [22], Fisher [23], and 
Georgescu [28J. This extensive development has been due in part to the fact 
that the assumption of an infinite universe permits application of methods 
which are not applicable to the study of finite variation. The probability of 
getting a variate remains the same no matter what the law of replacement. 
The assumption of an infinite universe at first appears to make the results 
inapplicable to all actual problems where the universe is finite. However, if the 
universe is large, the assumption of infinite size does not greatly alter the results, 
although the extent of the change can not be determined without comparison 
with the results of finite sampling. A justification for the use of infinite sam¬ 
pling in actual finite sampling problems is based on the fact that the formulas 
resulting from sampling from a finite parent with replacements are the same as 
the infinite formulas. Hence the infinite results may be used to characterize 
finite sampling if sampling is done with replacement after each drawing, This 
clever scheme is somewhat invalidated, in actual sampling, because of the 
practicability of replacing and remixing after each drawing. Until someone 
demonstrates a technique which is practical and effective in securing randomness, 
it must be said that the value of infinite sampling theory as applied to finite 



108 


PAUL S. DWYER 


sampling depends upon the theoretically unsatisfactory assumption that a 
finite universe is infinite. 

The theory of sampling from a finite universe without replacements has been 
developed by such authors as Isserlis [8], Tchouproff [15], Neyman [18], Church 
[19], Pepper [24], and Carver [ 26 ], although available results are not as extensive 
as those mentioned above because of the difficulty of algebraic manipulation 
and because of the length of the formulas. The fact is that the probability of 
getting a given variate varies with the different drawings. However, a "return 
to the bag 1 ' is not demanded. 

The terms “infinite sampling 1 ’ and "finite sampling 15 are adequate to describe 
the two hinds of sampling discussed above, but they are inadequate in the case 
of finite sampling if additional replacement laws are introduced. Hence, it 
seems preferable to characterize the type of sampling by the replacement law 
if the population is finite. 

When the Carver functions represent known functions of n and jV, it is 
possible to use them in writing moment formulas for any orderly replacement 
law. For example, it is shown in later sections how Carver functions can be 
applied to 

L Finite sampling without replacement, . 

2. Finite sampling with replacement rafter each drawing, 

3. Finite sampling without replacement up to the n-th drawing before which 

the n - 1 withdrawn variates are replaced and mixed. 

The Carver function can be used symbolically even in cases in which its 
explicit statement in terms of ft and N has not been found, In some statistical 
formulas the Carver functions cancel, so that the results are independent of the 
sampling law. 

11. Variable Distribution Daws. It is possible to generalize the theory to 
include the case in which the variable takes on a different frequency distribution 
after each drawing, he., the general Tchouproff formulas can be written in terms 
of Carver functions. This theory can also be generalized to include many 
variables. In this dissertation, however, it is assumed that the universe remains 
the same, aside from the unreplaced variates forming the sample, throughout 
the sampling process. 

Chapter III. The Application of the Double Expansion Theorem 

It is the purpose of this chapter to establish the basic theorems on which the 
more specific work of the later chapters is based and to show how the double 
expansion theorem is to be applied to the sampling problem. 

12. Formulas Concerning Ordered Samples, a. Sampling with replacements. 
If the samples of n are taken from a universe of H variates and if the variates are 
replaced after each drawing, then the number of possible ordered samples is jV fl 
since for each of the n drawings there is a choice of Ah 



COMBINED EXPANSIONS 


109 


b. Sampling without replacement. If the variate is not replaced after each 
drawing; the number of ordered samples is 

N(N - 1)... (N - n + 1) - N'"\ 

c. Replacement before the last drawing only. In case sampling is.with replace¬ 
ment before the last drawing only, the number of ordered samples is 

N(N - 1) ... (N - n + 2)N - N^N. 


13. Theorem I. All moments of moment functions of samples can be expressed 
in terms of expected mines of products of power sums of samples . 

By moment functions we mean rational integral isobarie moment functions 
[31; 22]. 

The theorem follows at once from the definitions of section 9. From (3}, (4}, 
[5], {6},- {7} it is clear that all moments of moment functions of samples are 
expressible in terms of the expected values of sample moment functions. But 
since the sample moment functions are themselves defined in terms of power 
sums of the samples, the theorem follows. For example 


MzM = fiziifa} - Mi(»h) = E 


“(2)_ 

(ixir 

2 

>{? 

_ 0)(D\T 

- tt 

« 2 1 


ft* JJ 


IS} 


and 


, m{) = mim, m) - im(frk, Wi)wi(wb, mi) 


W E 


m) „ m _ r jg) 

n 2 « 3 J \n 



[91 


14* Theorem II. All moments of moment functions of samples can be expressed 
in terns of expected values of power products of samples. 

This follows at once from the application of the multiplication theorem of 
Part I to the theorem of section 13. Each product of power sums is expanded 
by the multiplication theorem into sums of power products. Thus 


- E 


(2X2) _ 2(2X00) , or - 

n % n 5 n ! 


“ (i" 5' + s) E(4) + ("S + J) B(31) + (J ■ l +1 ) m) 

+ (^ + ^) J?<211)+ i E(111I> ' (10! 


15. Theorem IIL To every power product form (<M 2 •••?*) there corresponds 
a power product form (QiQ 2 • * • <3,). 

The argument is simple since the terms of (ft® • • ■ g,) are themselves terms of 
(QiQa 4 4 4 Q,). It follows at once that, if (q\q % 444 g,) exists, then (QiQa • 4 4 Q«) 
exists. 



110 


PAUL S. DWYER 


As au illustration, consider the universe consisting of #i t j £ 4 , and the 

4 _ 

sample consisting of Xa, Xt . Then the terms of (fMstfs) = J2 %Vi x]\ ] 

5 

are all contained in the terms of (QiQM = 2 #?! * 


16. Theorem IV. If definite Ids can be determined so that 

$(M 2 ■••?»)- hwr"pXQ^h * 1 * &)» (Hi 


then it is possible to use the double expansion theorem and express the moments of 
the moments of ike sample in terms of the P functions of Part I and the power sums 
(or moments) of the universe. 

The double expansion theorem was designed to replace (^g* • • * q*) by 
*W-j>.(Q $2 ” * 00- It can be used as well to replace -E(M 2 • * • 5h) by 
fcpiPa-’P.CQiOa * - • 0*) if the values of h plPv .. v , can be determined, The results 
of such a substitution in terms of the power sums of the universe are then given 
by the double expansion theorem, For example 

„ N E(1Y E( 2) , 3(11) 

^ m IT “ ir + -sr 

and if i?( 2 ) = fo( 2 ) and #( 11 ) « fcu(ll) then 

*0*> - (*> - ku) ^ 


n * W + «« 

where Ki ~ h — hi and K n ~ hi . 

It then appears that the methods and tables of Chapter I of Part I can be 
used in finding expressions for moments of moments, in case & mw ,..p 4 is known. 
Thus 


M ] irk) = E 


(?) _ (OWf = n f(2)(2) _ 2(2)(1X1) + (1)(1)(1)(1) ~| 


n n 2 J 


L w 


w 


n* 


J 


= JVC4) + Pn(2)(2) _ ^ P|(4) + 2fti(3)(l) + Pn(2)(2) + f m(2)(l)(l) " 


n‘ 


+ 


fiffl + 4P n (a)(l) + 3P m (2)(2) + 6Pw(a)(l)(l) + Pim(l) 4 


iv 


and when ( 1 ) = 0 


^ - (S - § + S) ®+ (5 - T+ if) m). 113) 



COMBINED EXPANSIONS 


111 


where 


Pi « ki — 

4fcsi - 

- 3Aas -p 6Ann 

Pat « 

hi 

— 3/?2h -(- 2Ajmi 

^22 


hi ~ 2/^211 -f Ami 

Pm = 


hv. — Ami 

Pun =* 


Ann 


as given by {54] of Part I. 

The basic problem has thus been reduced to finding h Pr .. Pt such that 

* ■■$,)« KyvSQtQs ‘ * • &)• 


71 

17. Theorem V, The expected mine of a sample power sum is always times 


the corresponding universe power sum no matter what the replacement law, 

The expected value of the sample power sum is always the same even though 
the fc’s take on different values for different replacement laws. We note first 
that the number of ordered samples, S f depends upon the replacement law. 
Now a given sample power sum, (a), has n terms, while the corresponding 
power sum of the universe, (A), has N terms. All the a-th powers of the 
variates in the universe appear in the ordered samples and, if we add all possible 
ordered samples, these terms appear the same number of times. Hence 


£ (a) = M(A) Mid 



Now the number of the a-th powers of the variate in 2 (a) is Sn so that each of 
the N variates appears — times. It follows that £ (a) == —^(A) and hence 

that E(a) = — (A). Hence 


#(a) « k\ (A) where fci = ~ {15} 

no matter what the law of replacement. 

An illustration may serve to clarify ihe argument. Consider a universe 
composed of x x , a 2 , and write the six ordered samples. Then 

£ (a) _ Si + Xa + + as? + ~k + $ 4* fta _ ^ 

(A) + a?3 


B(a) _ 2 n 
(A) 3 N' 


and 


112 


PATJO 8. DWYER 


18. Value of k Pv ,. Vi for sampling without replacement. Consider a universe 

and all possible ordered samples. Form ( Q 1 Q 2 ■■■$«) and £ • • * &)• 

Now 2(^2 •. . q t ) is a symmetric function of the. vaviates and consists of 
JV !n V a) products, ancl (QiQ% > ■ • &) consists of N [,) products. Each of the 
N lt) products is repeated the same number of times in the N n) n s) products of 
E (Ma ■ • * ?*)• To find the number of times such repetition is made, it is only 
necessary to divide the total number of terms in £ ■ ■ • q s ) by the number of 

N (n) n {i) 

terms in (QiQz • ■ • Q,) which gives • Hence 

Z(*»•■• ft)4) 110) 

and, dividing by the number of ordered samples, N {n \ 

>> 

' * ’ ?«) ** (QlQz • * ' Q*) (I7j 

so that 

(s) 

1=5 jjb) 

as stated in section 46 of Part I. 

Since (&$ • - • q*) «* Silsjf • • ■ Sphtftofr * * * q>) 
and (Q\Qt • • < Q») = SiW •' • • • • Q«) 

it follows that 

*<*) 

EMiqiQz 11 ' <?0 “ jyfa Af(Qi Qi ’ ‘ * (19} 

Most earlier writers on finite sampling have used the idea expressed m (19) 

as the foundation of their work. They have found it necessary to undertake 

enormous algebraic manipulation to expand in terms of monomial symmetric 
functions and then to expand back in terms of power sums after making the 
coefficient adjustment. Such long derivations are not only laborious, but they 
are also apt to result in algebraic errors and the results obtained have not 
emphasized the symmetry which is inherent in the nature of the problem and 
which is very useful in checking calculations. It was Carver who first discovered 
the type of symmetric relation involved and who used it in obtaining a compact 
statement of the first eight moments of the sample sum in the case of a single 
variable. He, too, found it necessary to carry out extensive algebraic manip¬ 
ulations as his reference to ‘ lavish use of symmetric function^ 1 (25; 104] 
reveals. His keen insight into the essential nature of this problem, led him to 
the conclusion that such extensive algebraic manipulation should not be 
necessary and that it should be possible to apply P functions to sample moments 
of order higher than the first. His confidence that this could be done and his 



COMBINED EXPANSIONS 


113 


encouragement in the task have contributed in a large degree to whatever merit 
this dissertation may have. 


(<) 


With ~ ^, it is at once possible to write the P function expansions. 

7 if)l \ \ 

Following Carver, we let p x = ^, etc. and get, from sections 

43 and 44 of Part I, 


Pi = pi 

Pi ~ Pi — p2 

P 3 05 pi — 3pa 2pj 

Pi = P\ ~ 1 Pi 12 pj — 6 pi 


Pll = />3 
fil — Pi — pi 
Pn " pi — 3pj -{- 2p< 
■Pfl2 = P2 — 2p3 d" Pi 


etc. 


etc. 


19. Expected Values of Products of Sample Power Sums, Sampling Without 
Replacement. The tables of Chapter I of Part I are now available for use. 
Thus 

w(W = E(W = i E(l) 1 = i[p,(3) + 3P»(2)(1) + P ' 111 ( 1 )*]. {201 

li 71 

where 

p _ n _ 3ft(ft - 1) , 2 n(n - l)(ft -“-2) 

3 ~W N(N - 1) + N(N - 1 ){N - 2) 

p _ n(n - 1) __ a(n - l)(n - 2) 

21 “ N(N - 1) V(V"-I)(V^- _ 2) 

p _ n(n- 1 )(n - 2) 

,tt 7V(A^ — l)(iV — 2)* 

Formula {20} might be written as 

*0®.) = \lP,Nn, + iPvN'nm + PmiW!) |21) 

n 

We note further that as N —> *> 

NP& -> ft, ? 21 W 2 ->ft (ft - 1), PiuN 1 »(» - l)(ft - 2) 

so that 

gsCwi) = i [ftg 3 -f 3ft(ft - lWi + w(ft - l)(ft - 2 )/£i] (22} 

tl 

More generally 

.-fti,(Ql)(Q2) ' ‘ ' (Qr) ” gjiMsi * * ' A*«r * {^i 



114 


PAUL |3. DWYER 


As N approaches infinity this becomes 

■Pm 1 ...»i r (Ql)(Q 2 ) (Qr) — ’ * ’ Nr • {24} 

The Jaws of infinite sampling may be obtained by replacing power sums by 
moments and P mi .,. mr by n (r) . The tables given in a recent paper (31; 30-32] 
were obtained from the tables of P functions by this method. 


20. Sampling With Replacements. We next consider the case of finite 
sampling with replacements after each drawing. This is such a simple case 
that the P’s can be determined without finding the ft's. 

Consider a universe and the N r possible ordered samples, Thus the nine 
ordered samples of 2 from a universe of 3 are indicated by the subscripts 


11 

21 

31 

12 

22 

32 

13 

23 

33 


The samples 11,22,33 are not repeated while the others are. The multiplication 
theorem can be used in grouping types of product terms as it was in Part I, 
but the terms themselves have different interpretation, Thus (1)(1) = (2) 
(11) can be written as (1)(1) - (2) -f- [11] where the (2) indicates the sum of the 
n terms found by multiplying an x by itself, while the [11] indicates the sum of 
the w(n — 1) products formed by multiplying one x by another. Since some 
of the cc T s may be alike, it is possible to have squared terms in (1 * 1], but they 
are not treated as squared terms, but rather as products. For example, if 

(1) X\ -f X\ 


so that 


(1)(1) = *} + + ziXi 


(2) *= Xi -p X\ and [11] — Xixi T * 
In determining the expected value of (1)(1), we note that 


Eo)0)-Ec*) + E[mi 


where £ holds for the N’ possible samples. Now £ (2) = hi (2) and tel = 

ft 

so that E( 2) = -^(2) as indicated in Theorem V. Also [11] is composed of 
ATV^ products of 2 XiXj « (£ £ aA It follows that 

<,/-* \f-i / \i>i / 


2 ti’i] 23 




gv 

. N 2 
£?[1.1] r 


- (1)(1) and that 
( 2 ) 


n 

N* 


mi 



COMBINED EXPANSIONS 


115 


ft 

It appears that plays the role of Pn. 

MS) = ^[(2) + [nj] 

tl 

ifoW + P u (l)(l)l 

lV 


The corresponding argument holds for the general case, Any product of power 
sums can be expanded in terms of (qiqi • • • q s ), If duplicate variates are 
introduced, use the notation [<M 2 - ■ * &]. Form [qiqt •■■</,] for all the N n 
ordered samples. Now [q^ • <« q t ] has terms and ^ lQi& * • • q s ) ~ 

MQiKQa) • * * (Q.<) has n u) N n terms, while (Qi)(Qs) • ■ • (Q f ) has N* terms, 

n (l) N n 

It follows that h ~ —~ r —, that 

A* 



X) * • • q 

J = ^ (QiXQO • • • (Q>), 


and that 

Elqiqz • • • < 

tJ W 

?.) = ^ (QiXQd ■ ■ ■ (<?.). 

(25) 

Hence 


P 

(26} 

In general 


* * * (Ofl) ” W * # ‘ 1 

(27| 


Comparison with {24) shows that the same basic laws appear no matter whether 
sampling is carried on with replacement, or, in the infinite case, without re¬ 
placement. 


21, Other Replacement Laws. The two cases just examined represent two 
extremes of orderly replacement laws, It has been shown in each case how the 
Carver functions can be used to express relations between the moments of the 
moments of the sample and the moments of the universe. It is possible to show 
how these functions are applicable to other replacement laws. We take, as an 
illustration, the case in which no replacements are made after each of the first 
n — 1 drawings, but just before the last drawing the n ~ 1 variates are replaced 
and mixed. I do not present here the detailed argument, but simply indicate 
that the appropriate value of k Pv .. Pt is 

h Pi .,,p A 

= ~ + jpgl ((n - 2) w - n w + (T + 2« + . • • + 2")(« - (28} 


n 

where P% - « and Pn » ==-, 
N jv 2 



116 


PAUL S. DWYER 


22. Different Frequency Laws. The distribution of variates may follow some 
known frequency law such as the normal* rectangular, binomial, Poisson, etc. 
In such a cose, if the relations between the moments are known, it is possible 
to simplify the results. 

Chapter IV. The Moments of the Mean 

To illustrate the previous theory in a simple situation we consider the moments 
of the mean. Carver [25] has done this previously for the case of finite sampling 
without replacements, but he has taken the measures of the universe as devia¬ 
tions and has used the sample sum rather than the sample mean. O’Toole [26] 
has generalized Carver’s work. 

23, The Moments of the Mean. We have at once 


^ 2?(1) ^ Pj(l) W jUj 

ft ib 

wOm.) = i «(l) ! = ~ 1^(2) + P u (l)(l)] 

/■•Cm.) = ijSW = l [P,(3) + 3fti(2)(l) + ft u (l)(l)(l)] 

w(W = jjj £(0* — ~t [P<(4) + 4P 3 i(3)(1) + 3P h ( 2)(2) + 0Pii,(2)(l)(l) 


and 


+/win 



24, Moments About the Mean of the Sample Mean. Using {11, we get 

w( w - ~ ift(2) + (a, - pDaxni 

+ 3(Psi - P I P,}(2)(1) + (Pu, - 3P„P, + 2P))(1) 5 ] 

=* ^ WO + 4(ft, - p,p,)(3)(l) + 3 Pm(2)(2) 

+ 3(P,h-2P !1 P 1 + P ! P?)(2)(1)(1) 


+ (Pun - 4P m P, + 6 P u Pl - 3P!)(1)*] (30) 


etc, 



COMBINED EXPANSIONS 


117 


These formulas can he written in the notation of moments of the universe as 

m( H) = ~ [ftiVw + (Pu - p5)JVVS] 

li 

£a(b?ii) = ^jI-FVV/ib 4~ 3(P 2 i — P 2 P + (Pm — 3PuPi 4* 2Pi)A^Vi] {31 j 


etc, 


25. Moments of the Sample Mean When the Universe is Measured About Its 
Mean. When (1) = 0, the formulas of section {23} become 

ah(wi) = 0 
w(wi) - i Pi(2) 

7r 

M » i P,(3) 

W 


and 


Mi(wi) — ^(Pi(4) 4- 3 Pk(2) 2 J 

Hr(mi) = i E (p,,,(, P»p...p;*(Pi) r ' • • • (P,y 


|32| 


where the £ holds for all partitions having no unit parts. In the language of 
moments (32) becomes 

i s ( ?r ,■ • • W" m 

where the £ holds for all partitions of r having no unit parts. 

26. Moments About the Mean of the Sample When the Universe is Measured 
From its Mean. Similarly, when (1) = 0, the results of section (24) become 

\ 

Mi(mi) - ~ P 2 (2) - ~ PiNik 


#a(mi) — “tPj(3) — — P%Niii 
w w 

f,W = [P.(» + 3 Pm(2) ! ] 


(341 



118 


PAUL S< DWYER 


It is to be noticed that the values jM m 0 are equal to the values Mr(wi). This 
results from {4} and the fact that =* 0. It should be noted also that 

fir ('nil) % MrOm) as MW 0. 

27. Sampling Without Replacements. The formulas In sections 23-26 are 
general formulas which become more specific as given replacement laws are 
introduced. If the law is sampling without replacements, we recall that 

n {,) 

Pi = pi, Pi = pi - pi, P 3 = pi — 3pi 4- 2 PS , etc. when p» *= . It is at once 

possible to write the appropriate formula. Thus 

Ma(wi) = MW » i P 3J , j 

n 

_ 1 r q j_ o nr _ AN - n)(N ~ 2n) _ , ocl 

n* 3/?2 + 2p3N{i3 n?(N - l)(N - 2 ) M3 ‘ 

Now pa * 0 in any symmetric universe, for example a normal or rectangular 
one, so MW = 0. 

28 . Sampling With Replacements, In this case P mi >, , mt - and we have 
piOmi) = hi 

MW - i [nm + n[n - 1)mJ] 

71 

rftRi) = i [ng a + 3n(n - l)/a 2 pi 4- n(*i — \){n - 2)ju5] 

Tl 

MW = i («fu + 4w(?i — l)paf/i 4- 3n(n — 1)4 

tv 

4- 6n(n - l)(n - 2)^4 + n <4) 4l 

and in general 

*C«0 = I , r . M »%> ; • • O-J" 1361 

and 

rfmi) *= [ng 2 - ngi] 

7lr t 

MW =* ~ [% 3 - 3n^ 2j ui 4- 2nnl\ {37j 

MW = ^ hfn - 4^5^, 4- 3n(a - 1)4 - 6n(n - 2 )/j$ 4 4“ 3i \{n — 2)41 


i 



COMBINED EXPANSIONS 


119 


while 

h(m i) =* p 2 (mi) - — 

n 

ftW ~ mimi) = ^ 
n 2 

= m(m i) = — [fa + 3(n — l)pl] {381 

rr 

etc. 


29. Sampling With Replacements Before the Last Drawing Only. The 
values of fc Pl ...p 4 of section 21 determine the values of the P’s. Thus P 2 = 

]}_ , % ~ *) p = w(w ~ 1) _ 2(n - 1) 

^ 11 W(W-l) iV 2 (W-l) 


, , _ n n(n 



(w - l)(tt - 2) 

— 


P2 + 


(rt-l)(fttf-2) 9 \ 

-FTTT- *)' 


(n - l)(n - 2) 
#- 1 



{391 

(401 


30. Different Frequency Laws. As indicated in section 22, the frequency 
distributions of the parent may be characterized by some moment relationship. 
This relationship can be inserted and the resulting formula simplified. For 
example, if the law of the formation of the universe is that of the hypergeometric 
series [25; 113] 

&.«w[r i + <-i)vn, (4i} 

we have 

p \ 

jfefoii) ™ J Npq 
n~ 

a,(m,) = ^Npiiq 1 - f) , (421 

M.(mJ = + P 1 ) + 3Ptu\ ! p ! q’l 

Jr 

etc. 

Where the values of P 2 , Pi , Pi are to be inserted according to the replacement 
law which is used in forming the samples. The results for sampling without 
replacement agree with those given by Pearson [1], 


31. Moments of the Sample Sum, We might use the sum of the items in the 
sample instead of the sample mean, For example 

w (l) - F(1 f - n l E(mif = »%(i»i). 



120 


PAAJL DWYER 


The results would parallel the results above except that n in the denominator 
would be eliminated. It is the sample sum which is used in Carver’s article 
[25] and this should be noted in comparing results. 

Chapter V. The Mean and Variance of the Variance 

As a further illustration of the use of the Carver functions there are presented 
in this chapter formulas for the mean of the variance and the variance of the 
variance. 


32. The Mean of the Variance. 


-*r®- 

_ n 


d)(i)' 


n 


s 


and 


Pi ,„s p>( 2 ) + Pud) 1 

“ n v ' ~ » s 


= i [(nft - ft)C2) - Pu(l) ! ] 

n 2 


m(Si) = -i (nPi - PiWfis. 


n‘ 


!«} 


(44| 


* ' <l f% 

When sampling is with replacements Pj =* P 2 ™ ^ and we got the well known 

N 


pi($i) = ~——- ik 
n 


{«! 


while when sampling is without replacements, we have the well known 

(i - i)* 


MiW - 


1 - 


I 

N 


{46} 


33. The Second Moment of the Variance, 

MW ~ f? T^ - 

L n 2 


2(2)<1)(1) (l) 4 " 

n* n* _ t 


becomes 


w(W - (§■-+S) w - < (§ ■- §) m\ 

+ (5 - + §) <»(*> - 2 (hi 1 - -(2X1X1) + Puh(1)‘ {47| 



COMBINED EXPANSIONS 


121 



These of course can be written in terms of moments of the variance. 


(48} 


34. The Variance of the Variance. Since foOnh) — w( r m 2 ) — we 

have 



f«! 


Formula (49} may also be written as 


fa ('tfh) = ~i \ {^p2 — 2wPg + Pi)Nin — 4(n?2i — Pai)iVVs/u, 

7l 

+ («’Pu - 2?iPn + 3P 22 - n'p\ + 2nP 1 P i - Pl)N*»l 
2(nPm — 3 P 211 —■ nPiPn 4* P&Pii)^Vsm 4* (Pun “ (50) 


Formulas (49) and (50} are not expressed in terms of deviations of the variates. 
Neither do they assume any particular replacement law nor any particular 
type of universe, 

In case the universe is measured about its mean we can write at once, by 
placing (1) = 0 in [49} 



2ft 

tt a 



+ 



2ftf 

n 3 


4* 



(511 


and 

fc<«i) = I (( n *i>, - 2 nP, + PJNiu + (n'Pu ~ 2uJPa + 3ft, - n’P] 
n* ■ 

+ 2nP,P 2 -P?W$. (52) 

35. Sampling Without Replacements. Using the P's as defined by sampling 
without replacements, it appears that the coefficient of the mi term 

(Pi 2 P, .PA „ N (N_ - «Kn - lXjVti - N - n - 1) . 

\«« » J (N - 1)(N - 2)(N - 3) 1 1 



122 


PAUL 3. DWYER 


agrees with that given by Neyman [18; 477), Tchouproff [15; 660], Pepper 
[24; 234], Carver [25; 270], Also the coefficient of the mi term 

-f 3 ii 2Pai , 3 Ps2 __ fP\ P a\ 

n** n 3 n 4 \ n n 2 / 

N(N - n)(n - 1 )(N 2 n ~ 31V 2 + 6iV — 3«- — 3) f . 
*" ~ ^ ~ 3 ) W 

agrees with that of the above authors. 

As far as the author is aware, no one has written the coefficients of pa Mi, Ms Mi» 
and mi in the formula for 
The coefficient of fia/u is 

-4N(n ~ 1)(N - n)(Nn - N - n ~ l) fr({] 

n\N~\){N~-~ 2)(N * 

The coefficien t of M 2 Mi is 

—2 (— — ^ >2U — jjfiu 1 ^2 ft A ^3 

\n 3 n 4 ?i 3 ft 4 / 

_ 41V 2 (ft - 1)(1V - »)[(2n - 3)JV - 3(n - 1)] , fi , 
~ (IV - 1) ! (1V — 2)(7V r — 3) {m ' 

while the coefficient of m! is 

(Pmi PnPn\ A * _ 21V 2 (n - 1)(1V - *)[(2» - 3)1V - 3(n - 1)J , 

\ ?r 4 « 4 / n 3 (tf - 1 ) 2 (1V - 2)(IV — 3) ’ 11 

It is possible with some algebraic manipulation to use the P functions to express 
the coefficients of the moments as functions of IV arid n. The suggestion here 
is that such algebraic work is unnecessary since the left members of {53 j . * • 
[57] are as easily handled in an actual problem as the right hand members. 
It is possible to compute the coefficients from the p's and the Pts without writing 
explicit expansions in terms of N and it. Besides the formulas involving N 
and n are so lengthy that algebraic errors are apt to occur. The use of Carver 
functions is further advocated because the same basic formulas are applicable to 
alt types 0 ! sampling and because the tables of Chapter I of Part X are directly 
applicable, 



36. Sampling With Replacements. If » jj~j, the coefficient of 

is ~ [(w(w — l) 2 ] =*= -1 — ■ while the coefficient of yl is 

tv 

i [(«’ -2 n + 3)n(» - 1) - ()i ! -2 n+ l)n ! ] = 

Ti ft* 



COMBINED EXPANSIONS 


123 


Then {52} becomes 

ftW ® 4 l(n ~~ 1 ) 2 £i - (» - !)(* “ 3)^3. {58} 

lb 

The formula for ftOrFi?) becomes 

ftOfft) = i CC» ™ l) 2 ft - 4(n — l)Wi ~ (n- 1 )(n - 3)#*J 

+ 4(2 n - 3 )(n - l)w\ - 2(2n - 3}(« - 1) M J]. {59} 

Now (58} can be written in terms of semi-invariants by the use of ft = \ 4 -f 3X* 
and ft = ft so 

ft(fift) =* 4 l( w ~~ l) 2 ft + 2ft(« — 1)N|]. 

ft* 

See [S'; 209], [22; 57}. 

37. Different Distribution Laws. Given frequency laws can be inserted. 
Thus {44] becomes 

wXffii) = 4 ^ the ft “ pq 

?r 

while {52] becomes, if ft ™ pq and ft = pg(g 3 -f p 3 ) 

(iiOfe) = ~i (»’ft - 2«Pa + Pih'M + p‘) 

+ - 7 (n‘Pn - 2iiP„ + 3P„ -» ! P| + 2nP.Pi - P|)pV. (60} 

n* ** 

Other frequency laws can be inserted similarly. 

Chapter VI. Tabular Presentation of Formulas. 

It is the purpose of this dissertation to show how the P functions can be used 
in finite sampling rather than to present an exhaustive list of formulas. The 
specific formulas of the two previous chapters are derived, primarily, for illustra¬ 
tive purposes. The implication is that other formulas may be derived similarly. 

However, it is possible to present, implicitly in tabular form, a number of 
formulas, In this chapter there are presented formulas involving moments of 
weight equal to or less than 6. 


38. The formulas of weight 2. 

w( . m ,) = [§ ( 2 ) + §(iy] 




m 


VAVL S. DWYER 


caw be written in tabular form as 



2 

11 


2 

11 

2 

Pi 



1 

n 


- ■ — - r1L v 

11 

P* 

: Pit 

\ 


J 

*1- 

1 

n 2 


with little effort. The first entries in the top row indicate the power sums of 
the universe, while the columnar entries indicate the momenta of the sample. 
Now 

„(W - s[ ( f - ( W>] 

and 


The coefficients of the power sums in the expansion of w are entered in the right 
hand part of the table. Thus, under 2, there appear the entries i and — 
These when multiplied by the power sums as indicated on the left, give M 2 — 
- GW Similarly n\ ~ 


n 


w 


it 1 


Now the expected value is given by the proper P function expansion. The left 
hand portion of the table, which is the same as the P function table of Chapter I 

of Tart X, gives such expansions. *Thus the coefficient of (2) in E(tos) is 

, P 11 

while the coefficient of (1 )(1) is-?« Hence the complete formula is 

Jr 


n 


w 


1) (a) -1' (1)(1 > 

as indicated above. 

39. The Formulas of Weight 3. Similarly the table 



,3 

21 

111 


3 

21 

111 

3 

Pi 




1 

n 



21 

P 2 

P 11 



«r* 

i 

1 

1 n* 

1 


111 

I P 3 

3P*i 

Pm 


2 

n 3 

n 3 

l 

1 

n 8 





COMBINED EXPANSIONS 125 

can be used to give the formulas 

»('*■> - ($ " f + ?)® - 3 @ - v) W1) + 2 ¥-‘ W 

Wfe, mO = (£ - £,)(3) + (| l ~ 3 ^') ©(« - <«* |63] 

mOmO - J (3) + 5 (Z)(1) + V W- W 

In case we wish to express the results in terms of moments about the mean, 
(1) = 0, and we have 

- (7 - f 1 + f’)< 5) t«i 

mi) = (§ - 5) (3) f««l 

w(>»i) = ~jj (3) {671 

so that 

- (? - f + ?)** ' ««i 

W.(m.,mi) = ^-|)iV/i s [69) 

*(*.) -~Nlk. {70{ 


The insertion of specific sampling laws gives the specific results of earlier authors. 

40. The Tabular Forms. It is further evident that the power of n in the 
denominator is equal to the sum of the subscripts of the Carver function above it. 
We might utilize this knowledge and write in the right hand part of the table 
the numerators of the entries in the tables above. The table of weight of 3 
would then appear as 



3 

21 

111 


. 3 

21 

111 

3 

Pi 




1 



21 

P 3 ! 

Pu 

■i 


-3 

1 


111 


3 P 21 

Pm 


2 

— 1 j 

1 


and it is possible to read (62}, {63), {64}, {65}, {66}, and {67} directly from it. 





















mil Pe 5P« 10P« 1QP an I SPrn 
















































































321 I 222 !l 3111 2^1* ! 2 1* 1* 6; 51 42 33 411321 222S111 2*1 2 2 V 1 £ 














































































128 


PAUL B. DWYER 


The tables of weight W = 2, 3, 4, 5, 6 are given in Table I. The right hand 
partitions not involving unit parts are underscored as these indicate the columns 
which should be used if the universe is measured about its mean, As an illustra¬ 
tion we write from Table I the value of minh). We get 



as previously indicated. 

The same tabular scheme can be used to write formulas of weight greater 
than 6. 

41. Moments of Other Sample Moment Functions. It is possible to use a 
similar tabular scheme when we wish to find the moments of other sample 
moment functions. We define 


, .... (2) (1)(1) 

n n fi 

. (3) „ 3(2X1) , 20Y 

3 n n 1 n 9 

l - W 4(3)(1) _ 3(2)(2) 12 (2X1X1) 6(1) 4 

4 n n 2 rfi n 9 n 4 

and, in general, 


I 


!■■= £(-ir<P-!)!(„., 




- • • fox* 


The formulas of weight 5 are given by Table II, 


(7M 


TABLE II 



6 

p> 

41 

3# 

311 

221 

2111 

I s 


-5 

41 

32 

311 

221 

21’ 

l 1 

5 

Hi 


II 





1 

■ 

■ 





41 

Pi 

Pu 


■ 





-5 






-n*“T 


Pi 

Hi 

Pu 

B 





“10 


1 




311 

Pi 

2P»i 

■EM 

Pm 





20 

-4 

-1 

1 




221 

p. 

P» 



Pm 




30 

“3 

-3 


1 



2111 

Pi 

3P«! 

P M 3Paa 

3Psu 

3Pju 

Pirn 




12 

6 

-3 

-2 

1 


11111 

Jp$ 

5P« 

10P„ 


lfiP J21 


Pain 


24 

—6 

“2 

2 

1 

-I 

1 












































COMBINED EXPANSIONS 


129 


Thus lor example 



7 I} + 

«» T 


ISP, 

n 4 



+ 


lOPn ■ 12 F 31 
n 8 n 4 


36P22 60 Pa 2 ^ A 2 _ _ 

—-iV 

ft 4 ft 6 / 



If all the entries in the right hand part of Table I, except the unit terms in the 
main diagonal, are placed equal to 0 , the tables can be used to give the moment 


function of the m a .' Thus, when w = 3, 


Ml( W = ~~ (3) 
n 

(73) 

«•»)-§ (3)+ §< 2 )(i) 
n w 

[74| 

^(W = |(3) + §- 1 (2)(l) + §-‘(I)’ 

(75) 

and 


p 

w W * — l Nh 
n 

(76) 

p 

Mn(ws, mi) - ~^Njx 3 

w 

(77) 

MsW = ~Nfa. 

TV 

(78) 


42, Other Moment Functions. The tables give such formulas as /* r (w 0 )> 
/*rir 2 (wi a , Mb), etc. If formulas for p. T {ni a )i /VjIvL, ffk) etc., are needed, it is 
necessary to go through the usual work of changing from moments to moments 
about the mean. 

Let us derive a general formula for the correlation of the mean and the variance 
as an illustration of the use of the tabular formulas. By definition 


mi) 


Mn(ma, Wi) 

> Wh)/I[re(wia, Wi)l 



Mu(ws , mi) - Mn (^2 f nii) 

£20(1^2 f nii) = MsO'fb) Mi(#h) 

= Ma(Wt) - (iiW ~ Ma(^i). 


Now 



130 


Paul S. DWYER 


Some of these values have appeared earlier in this paper* Without using the 
earlier results, we find from Table I 



Hence {79} becomes 
ni(m s , mj) = 


(ftPa — P s) £3 


[(ti 2 jpj “ 2nP$Pa 4 ~ (ft"PaPu ~ 2nP$Pu 

4 3P|P M - n 2 P,F\ 4 2nPlPi - P^} 1 


{80} 


Formula {80} gives the correlation between the variance and the mean no 
matter what the law of replacement. If the universe is symmetric, ju 3 « 0 and 
m(wi, mi) s? Q. 

The usual special cases may be obtained. When replacements are made, 
{80} becomes at once 


ru(ma,mi) 


(n — l)ft 3 _ 

[(tt - l)jii^2 — (3 - 


{81} 


as indicated by Pepper [24; 246], 

When no replacements are made {80} reduces to results previously given by 
Neyman [lg; 489] and Pepper [24; 245], 


43. Conclusion. The theory presented here is capable of generalization in 
many ways. For example, application to multivariate distributions readily 
follows. However an attempt has been made in this dissertation to emphasize 
the essence of the method. Illustrations have been chosen to indicate its 
inherent generality. 

It should be stated, finally, that the aim of this dissertation is not primarily 
to provide a list of sampling formulas, but rather to provide a method by which 
the desired sampling formula may be derived without too much algebraic work. 

In concluding this dissertation, I wish to acknowledge the guidance and 
encouragement of Professsor H. C. Carver. Also I wish to express my apprecia¬ 
tion to Professor R. A. Fisher and to Professor C, 0. Craig, who read the manu¬ 
script, or portions of it, and made needed suggestions for improvement. I am 
also indebted to Professor J. A. Nyswander and Professor T. H. Hildebrandt for 
valuable advice and assistance. 



COMBINED EXPANSIONS 


131 


BIBLIOGRAPHY 

(1) Pearson, K., "On Certain Properties of the Hypergeometric Series etc,, Philosophical 

, Magazine, 45 (1899), pp. 231-246. 

(2) Pears6n,K, "On the Probable Error of "Frequency Constanta/' Biometrika, 2 (1902-3), 

pp. 273-281. 

(3) Henderson, R., "Frequency Curves and Moments,” Transactions of Acturial Society 

of America," 8 (1904), pp. 30-42. 

(4) Pearson, K., "Note on the Significant or Nonsignificant Character of a Sub-sample 

Drawn from a Sample.” Biomelrika , 5 (1900), pp, 181-183. 

(5) "Student,” "The Probable Error of a Moan,” Biomelrika , 6 (1908) pp. 1-25, 

(G) Pearson, K., "On the Probable Errors of Frequency Constants.” Biometrika, 9 
(1913), pp. 1-10. 

(7) Issbrlis, L,, On the Conditions Under Which the Probable Error of Frequency Dis¬ 

tributions Have Real Significance.” floyat Society Proceedings, 92A (1916), 
pp. 23-41, 

(8) Isserlis, L,, "On the Value of a Mean as Calculated From a Sample,” Journal of 

Royal Statistical Society, 81 (1918), pp. 76-81, 

(9) Edgeworth, F., "On the Value of a Mean as Calculated from a Sample,” Journal of 

Royal Statistical Society, 81 (1918), pp. 624-632, 

(10) Tchouproff, A., "On the Mathematical Expectation of a Positive Integral Power of 

the Difference Between the Frequency and the Probability of an Event.” Pro¬ 
ceedings of the Pelrograd Polytechnic Institute, 

(11) Tchouproff, A., "Zur Theorie tier Stnbilitat Statiaticher Reirhen,” Slmidinamsk 

Aktuarieiidskrifl , (1918). 

(12) Tchouproff, A., "On the Mathematical Expectation of the Moments of Frequency 

Distributions.” Part I Biomelrika, 12 (1918), pp, 140-169, 185-210. Part II, 
Biometrika 13 (1920-21), pp. 283-295. 

(12,) Pearson, K,, "Peeeavimoa.” Biomeirilta, 12, (191S-19), pp. 259-281. 

(14) Splawa-Neyman, J., "La Revue Mensuelle de Staiistigue" Tome 6 (1023) pp, 1-29. 

(16) Tchouproff, A., "Oil the Mathematical Expectation of the Moments of Frequency 

Distributions in the Case of Correlated Observations,” Metron 2 (1923), pp, 
461-493 ; 646-683. 

(10) Pearson, K., "Note on Professor Romanovsky’s Gcneraliaation of my Frequency 

Curves,” Biomelrika , 16 (1924), pp. 116-117. 

(17) Church, A., "On the Moments o! the Distributions of Squared Deviations for Samples 

of N Drawn from an Indefinitely Large Population,” Biometrika , 17 (1925), 
pp. 79-83. 

(18) Neyman, J,, "Contributions to the Theory of Small Samples Drawn From a Finite 

Population,” Biometrika, 17 (1925), pp, 472-479. 

(19) Church, A., "On the Means and Squared Deviations of Small Samples From Any 

Population,” Biometrika, 18 (1926), pp. 321-394, 

(20) Greenwood, M. r and Isserlis, L., "A Historical Note on the Problems of Small 

Samples,” Royal Statistical Society Journal, 90 (1927), pp, 347-352. 

(21) Pearson, K,, "Another Historical Note on the Problem of Small Samples,” Bio- 

metrika 1 19 (1927), pp. 207-210. 

(22) Craig, C. C,, "An Application of Thiele’s Semi-invariants to the Sampling Problem,” 

Metron, 7 (1928-29), pp, 3-74. 

(23) Fisher, R. A., "Moments and Product Moments of Sampling Distributions,” Pro. 

ceedings London Mathematical Society, 2 (30) (1929), pp. 199-238. 

(24) Pepper, J., "Studies in the Theory of Sampling,” Biomelrika, 21 (1929), pp. 231-258. 
(26) Carver, H, C., "Fundamentals of the Theory of Sampling,” Annals of Mathematical 

Statistics, 1 (1930), pp. 101-121; 260-274. 



132 


PAUL S. DWYER 


(20) O'Toole, A. L., "On Symmetric Functions and Symmetric Functions of Symmetric 
Functions," Annals of Mathematical Statistics, 2 (1931), pp. 102-149. 

(27) Fisheb, R. A., Ann Wisiiabt, I., "The Derivation of the Pattern Formulae of Two 

Way Partitions From Those of Simpler Patterns," Proceedings London Mathe¬ 
matical Society, 2-33 (1931), pp. 195-208. 

(28) St, Georoesch, N., "Further Contributions to the Sampling Problem," Bio- 

melriha, 24 (1932), pp, 05-107. 

(20) Wishart, J,, "A Comparison of the Semi-invariants of the Distributions of the 
Moment and the Semi-invariant Estimates in Sampling From an Infinite Popula¬ 
tion,” Biomctnfca, 25 (1933), pp. 52-60, 

(30) Feldman, H., "Mathematical Expectations of Product Momenta of Samples Drawn 

from a Sot of Infinite Populations.” Annals of Mathematical Statistics , 0 (1935), 
pp. 30-52. 

(31) Dwyer, P. S., "Moments of Any Rational Integral Isobaric Sample Moment Func¬ 

tion," Annals of Mathematical Statistics , 8 (1937), pp. 21-65. 

BOOKS 

A. Whitworth, "Choice and Chance," 4th Edition (1886). 

B. Thiele, T. N., "Theory of Observations," (1903). 

B'. Reprinted Annals of Mathematical Statistics, 2 (1931), pp. 165-300. 

C. Mortaha, "Element! di Statistics,” Roma (1917). 

D. Rietz, H. L., (Editor-in-ohiof), "Handbook of Mathematical Statistics” 1924. 

E. Rietz, H. L., "Mathematical Statistics," (1927). 



DISTRIBUTIONS OF SUMS OF SQUARES OF RANK DIFFERENCES 
FOR SMALL NUMBERS OF INDIVIDUALS' 

By E, G, Olds 

- I. INTRODUCTION 

In a recent article, 2 reporting the results of research under a grant-in-aid from 
the Carnegie Corporation of New York, Hotelling and Pabst have given a 
comprehensive treatment of the theory and application of rank correlation and 
have contributed significantly to existing knowledge on the subject, It is not 
the purpose of this note to evaluate their contribution but to attempt the 
solution of a problem they suggest, 

In §3 3 they have given the well-known formula for rank correlation, r* ~ 1 - 
--, where n is the number of Individuals ranked and 2$ ~ £ d\ (di being 

71 71 faa 1 

the rank difference for the ith individual), In §5 the question of the significance 
of t' in small samples has been considered from the following point of view; if the 
value of /, obtained from a comparison of the ranks of n individuals as a possible 
measure of the relation between two attributes, is such that there exists, a high 
probability that it could have occurred by virtue of a chance rearrangement of 
the n individuals, then the value of r f does not furnish a significant indication of 
relationship. Then one test of the significance of a particular value of r 1 is to 
note whether it has a probability less than P (P equal to ,01 or, less stringently, 
equal to .05) of occurring because of a chance re-ranking. 

To apply this test it is necessary to have some information regarding the 
distribution of r* for the chance rearrangements of the numbers from 1 to n, 
Hotelling and Pabst have given the distribution of / for the cases, n = 2, 3, 4. 
They have noted that the distribution is symmetrical for each value of n and 
that it has a range from -1 to 1, From a consideration of the probabilities 
corresponding to 2c? - 0, 2, 4, 6, they have discussed the significance of values 
of r 1 for n = 5, 6, 7, In §8 they have stated, “Another problem is to find con¬ 
venient and accurate approximations to the distribution of r', for moderate 
values of n , with close limits of error, A table calculated along the lines sug¬ 
gested in §5 would be very useful." This statement, along with the interest 
manifested by others in private communications, has led to the investigation 
reported in this paper, 

1 Presented to the American Mathematical Sooiety, December 29,1936. 

s Harold Hotelling and Margaret Pabst, Rank Correlation and Teals of Sipni/tcewec 
Involving No Assumption of Normality, Annals of Mathematical Statistics, Vol. VII, 
1936, pp, 29-43, 

* Loc. cit. 


133 



134 


E. G. OLDS 


II, EXACT distribution op sums of squared differences 

In the paper mentioned above, the authors have given the exact probabilities 
for all possible values of r' for n = 2, 3, and 4. Since r' is a linear function of 
2<? for any particular value of n, there is a one-to-one correspondence between 
values of 2rf and values of r 1 . For example, for the case of n * 3, we have the 
following: 

- 0 2 6 8 



^1 2 2 1 

P ”3I 3! 3! 31 

where p represents the relative frequency of r f or of £ci 2 , Therefore it seems 
pertinent to investigate the distribution of 2d 2 for various values of n. 

If n individuals are ranked 1, 2, 3, «• ■ n, by one criterion and then are re¬ 
ranked at random there are n ] possibilities for the new ranking. Let us consider 
the differences between the numbers in the new and in the original rankings. 
Suppose these differences are represented by 4, 4 , • * * d n , Then it is apparent 

n 

that 2 <5 = 0 . If we let a L , c* , • •- a* represent an arrangement for n ~ k 

ir,l 

insert k + 1 after a* and advance the cycle one position at a time, we have the 


following arrangements for the case, n = k + 1: 



Ci , 

02 , 

(i3 , * • * 

Ofc , 

Jb + 1 

th , 

«3 , 

CQ , • * • 

k 4* lj 

a { 

* 

• . 

k 

k 

l lit 

* • * i 

V 

fc 

1 

a) 

o* t 

k + 1, 

Oi , • 1 ■ 

Ofc-2 , 

a k-\ 

k -j-1, 

0t , 

02 , 

Ofc —1 , 

c* 

Now, for n = k, 4 = flj — 1 f d% = a% — 2, . •. 4 ** a* — 
differences for the k + 1 derived arrangements, we have 

ft. If we list the 


j 4 i 

4 , • ■ * 4 

i 

0 

4 + 1 

> 4 + 1, 

4 “h I, ■ • • 1 

) 

d\ - k 

c/a *"b 2 

« 

• 

, 4 + 2, 

i 

• 

4 4* 2, * • 2, d\ 1 —• k } 

* lit < 

- • . * 1 

4 4* 1 ~ k 

(2) 

1 

« 

4 + fc- 

* 

1, Jk - 1 , 

* if* 

— 2^ 

* 

1 

• 

4-1 - 2 

k 

, 4-1, 

f?2 ™ lj » » • 

1 

4 - 1 







SUMS OF SQUARES OF RANK DIFFERENCES 


135 


It is apparent that each row of differences is formed as follows: the entry in the 
first column is formed by adding 1 to the entry in column two in the row above, 
the entry in the second column is obtained by adding 1 to the entry in the third 
column in the row above, and so on until we come to the entry in the last column 
which is obtained by subtracting k from the entry in the first column of the 
preceding row. 

If we form the sum of squares of the entries in each row we observe an interest¬ 
ing property of the set; the sums are all congruent, modulus (k -f 1). Let us 
write the sums, denoting them by Si, St , ■ * • & + i, Also let d\j represent the 
entry in the fth row and jth column. Then 

hi 

Si » 2 d<,i 

i ' 

HI HI 

&+1 = 2 m 2 M,/ + 1) S + M.1 ~“k ) 2 

j-l ,-a 

£4-1 

= 2 M,/ + ,1) 2 + M,i — k ) 2 — M,i + l) 2 (3) 

}”i v ' 

Jt+I 

= £ Wl, + 2d,,,- + 1) - (2d,.i - h + m + 1) 

j-1 

= Si + 0 + {k + 1) - (2di., - k + 1)(* + 1) 

= iS'c 4- (Jc — 2d,,i)(i! + 1) 

Noticing that d,.i = d, + i - 1, for i = 1,2, •.. k, and i4 + i,t = 4, we have 
S, = Si "f* (fc — 2di)(fc +1) 

S, = Si + (fc - 2d. - 2)(fc + 1) 

8, = 8, + (k - 2d, - 4 )(k + 1) 

' (4) 
* 

S H i = S k +(k~2d*~2k + 2)(k + l) 

s k+i = S w + (k - 2 h)(h + 1 ) - &+1 - k{h + 1 ) 

Of course, *V 2 = S L , as the (k -f 2)nd row is identical with the first and the set is 
closed, So we may write. 

&+i = jSi + *(fc + U (5) 

The analysis given above not only establishes the congruence of the sums, 
modulus (k + 1), but also indicates a method of deriving the sums for n ~ k + 1 

from the sums for n = k, since Si * 2 $ • It is also worth noticing that 

»~i 

depends not only on #$< (and therefore on Si) but also on d,- tl (and therefore 
on 



136 


E, Q. OLDS 


Another matter needs attention. It is the relation between the sums of 
squares of deviations for a particular order and for the reverse order, Let 
(2i, fli j * • • fl n be a particular arrangement. Then the reverse order is , 
a n -i i • - * ai. The sums of the squares of the deviates are, respectively, 


S — (ai — l) 2 + (#2 — 2) 8 -f * • * («* — AO* 
$ tss (a* — l) 2 + (fit*—i ™- 2) 2 -h • • • (ai — A;) 2 


(6) 


and 
Then 

5 4- $ ^ [(oi — l) 2 "I - (&i “ AO 3 ] T [(oa — 2) 3 + (<i2 ~ A; + l) I 

+ • * • [(#* — A;) 2 -f (a* — l) s ] 

« £ t (a f - J’)® + {a r - b -f r - l) 4 ] 


J(; A- 

— S ((Or — r) 4* (®r — & 4* t — l)]” “ 2 (# r — r)(fl f — A; -f r — 1) 

fw] 


f-»l 


= 4 1 a', - i(k + 1) £ a, + (k + 1 ) 1 1 - 2 2 «’ 

r^l r CT l r=l p™l 


+ 2(i + 1) S Or - 2(i 4-1) t r + 2 t r\ 

r=l 


r*l 


r-l 


Noting that 2qJ =* Sr 2 and 5Ja r = Sr, we readily obtain the result 4 

ft* — A; 




jfc s ~A? 


(7) 

k s ^k 


It is now apparent that the sums range from 0 to with a mean of . 

<J t) 

As the exact frequencies for sums of squares do not seem to be available, it 
seems useful to compute them for certain small values of n and, at the same time 

4 The geometric representation of the problem may bo of some intereat, Let the co¬ 
ordinates of point R t in Euclidean n-space be (1, 2, 3, < • * ri), the coordinates of R ho (ft, 
n — l, '>>2,1), and the coordinates of P be (an, *i, • • • sO* Let ns restrict the $'stobe the 
numbers (1, 2, 3, . ** n), but not necessarily in the order given, i.e. the locus of P is a set of 
ft! points, corresponding to the permutations of the numbers 1, 2, 3, - * * n. Then it is easy 

to see that £ a* * —-— and that points P He on an ft-fiat or hyperplane. Also «< 
i=i 2 fZi 

ii[n 4* 1) (2u *41) , 

---— so points P lie on a hypersphere with center at the origin. Let us consider 

the joins PR and PR.. It is readily established that they fire orthogonal. Then (Pfl) J + 
(PR) 2 m {RRy = ~ ~ Qr 


3 


, since S ~ (a*- i) 2 and 5 * £} fa - ft -f i - l) 2 , $ + 8 


i=l 


i® 45 ! 


if — ft 

—-— a result previously established otherwise. 
A 



SUMS OF SQUARES OF RANK DIFFERENCES 


137 


to devise a method which can be used successfully to extend the computation to 
larger values of n if desired. The details of the method follow. 

Let D n represent any series of n differences, d t , dz , • - • d n , and let be an 
operator such that O n operating on D„ (written £„(£»)) means that £„ - 
(di , da, ■ ■ • < 4 } is changed to (da + 1, d 3 + 1 - •• d n + 1 , d t ~ (il - 1 )). Let m, 
written following di , d 2 , • • ■ d n , indicate that 2 d 2 = m, For n ** 3 


D a , i = (0, 0, 0):0 

Ot(Di, i) - D t , 2 = (1, 1, -2):6 
O s (D u ) - D u = (2, -1, ~1);§ 


But we have shown that S + S = 



for n = h. 


Therefore, for n = 3, we 


have S + E = 8 , so sums of 0 and 6 indicate corresponding sums of 8 and 2 
when the order of the elements' is reversed. Thus we have, for n = 3. 


Sums 

0 

2 

4 

6 

8 

Frequencies 

1 

2 

! 0 

! 2 

1 


For n = 4 we have 

£4,1,1 = (0, 0 , 0, 0) 

£ 1 , 2.1 — (1 j —2 , 0) 

£4,3,1 ~ (2, —1, —1, 0) 

where these are obtained from £3,1, £3.2 and £3,3 respectively by inserting a 
zero as a fourth difference. We operate on each of these four times with 0\. 
For example, 


£ 4 , 2,1 


(1) 

V 

-2> 

0): 8 

£((£ 4 , 2 , 1 ) 

- £4,2,2 “ 

(V 

-1, 

1, 

-2) -.10 

£4 (£ 4 , 2 , 2 ) 

= £4,2,3 “ 

(0, 

2, ■ 

-1, 

-1): 0 

O 4 (£ 4 . 2 , 3 ) 

— £ 4 , 2,4 ™ 

(3, 

0, 

0 , ■ 

-3):18 

£(£ 4 , 2 , 4 ) 

= £4,2,1 *» 

(1, 

1/ " 

-2, 

0) 


As a cheek on computation, we notice, first, that the set is closed by the re¬ 
appearance of £ 4 , 3 , 1 ; and, second, that 10 , 6,18 and 6 are congruent, modulus 4. 
In like fashion, one of the sets for n = 5, is the following; 


£5,2,4,! *= (3, 

o, 

0, 

-3, 

0);I8 

£5,2,4,2 =» (lj 


-2, 

1, 

-1): 8 

£&,2,4,3 - (2, 

-1, 

2, 

o 5 

-3):1S 


4 




138 


E. 0. OLDS 


»«M ,4 = (0, 3, 1,-2, -2);18 
Dim = ( 4 , 2 , - 1 ,- 1 , - 4):38 
Dw ,, « (3, 0 , 0 , -3, 0 ) 

Of course the sums for n * 5 can be obtained from those for n = 4 by making 
use of (4), For Di $ % 4 = (3,0,0, —3)ii^ we have $1 *= 18, k =* 4, di = 3, ^2 ^ 0, 
Then 

/5i = 18 

& = & + (4 - 2.3)(6) 

>% - & + (4 - 3-0 - 2)(5) - 18 

/5\ = $3 -f* (4 — 2-0 — 4)( 6 ) s= 18 

*84 $4 -f- (4 -- 2 • *— 3 —* 6 ) (5) = 38 

81 = $5 ■— 4-5 = 18. 

However, results obtained by this latter method do not help with the case of 
to n 6 , If we desire to obtain results for n = 6 we will need to exhibit the 
complete sets of differences for n = 5 as we did by the former method. 

An alternative method for obtaining frequencies of sums of squares is of some 


interest, It will be illustrated for n 

<5 4, 

Let us consider the square array 


id\ 

6 . 

c{ 

dA 

. 

1 02 

h 

Ci 

dA 


lets 

h 

c .1 

ds 

\ai 

bi 

Ci 

di! 


If we form ail possible products ajb,c*;dj(i, j, h,l ~ I, 2 , 3, 4; i ^ j ^ k & 1), 
the subscripts give the 41 permutations of 1,2,3,4. Now let us form a new array 

( Go bi C2 d\ 

flu bo Ci (k \ 

a_2 6-1 c a di 1 

d -3 6-s c„j dj 

where subscripts in each column represent the vertical distance of the term, 
above the principal diagonal Since the original terms had subscripts giving all 
possible arrangements of 1, 2, 3, 4, terms formed in a similar fashion from the 
new array will give all possible arrangements of the differences. Now form a 
third array 

/V 3 1 x* aA 

/ z l a: 0 x l x i 

1 a?’* m 1 m° ad 

W A 35 1 #7 

where the exponent of x is the square of the corresponding subscript in the 



SUMS OP SQUARES OP RANK DIFFERENCES 


139 


TABLE I 


Frequencies of sums of squares of rank differences 



•The asterisk shows the location of the mean. The frequencies for n « 8,7 extend be¬ 
yond the limits of the table but may easily be obtained by symmetry. 



































140 


E, G. OLDS 


second array. It is easy to see that, if terms are formed from the new array 
by the same method as before, our terms are powers of $ where the exponents 
represent sums of squares of differences. If we now define the array to be equal 
to the sum of the terms formed from the array, then 

faa.’°' ”b fa# 4 T * • • + • • • faa?*® -}- fa£ 2<l , 


and the fas give the desired frequencies for sums of squares corresponding to 
exponents of tc. For example 2d 3 = 0 occurs ki times, "Sd 2 = 2 occurs fa times, 
etc. 

It can be readily verified that, for n < 5, the array can be expanded as a 
determinant and the values of the fas can be obtained by taking the absolute 
values of the coefficients in the expansion. Also, considering the arrays as 
determinants, their values for n « 2, 3, 4 are, respectively, (1 — af), (1 — a; 2 ) 2 
(1 — a 4 ), (1 — a: 2 ) 3 (1 — a; 4 ) 2 (1 — a 0 ). If it were possible to obtain a general 
form of this type it might be possible to greatly reduce the labor which is in¬ 
volved in expanding the arrays. At present, however, this method of attack 
does not seem feasible on account of the lack of adequate sub-checks, the amount 
of work involved, and its inappropriateness for use by inexperienced clerical help. 

Hotelling and Pabst 6 have given exact results in terms of n for the cases 
Id 2 = 0,2,4,6. It is certainly possible to follow their method to obtain general 
results for 2d 2 larger than 6, but, as they suggest, the work becomes very labo¬ 
rious. For 2d* a 8 we need the sets of possible integral values for x \, 

. « 7\ 

under the following conditions: (a) £ *< =* 0, (b) 2 *? ** 8; (c) 1 + »i, 2 -f x 2 , 

v^l 

3 -f x z , 4 + > ft + £„ are the numbers 1, 2, 3, . • • n, (but not necessarily 

in that order), 

Possible solutions are; 

(a) - 2, se,-j = 0, = —2 (i = 3, 4, ■ • • n) and the other fas zero, 

(b) %Cf>—j — 2, r f.— i :=r 1, St — — 1 1, i — 1, X(i — 1 (q, ~ 5, 6, • * * ti j 6 — 3, 

4, * • • a — 2), 

(c) = 1, Xh * -1, X fl _a = 2, aw = -1, x„ « -1 (a « 5, 6, - * • n; b « 2, 3, 

*»• a — 3), 

(d) *r&— 2 := 2,■ ss 1, %b ~ fa 3?a—i K 1, x a — ■—1 (a s 5, 6, * • ■ nj 5 s* 3, 4, 

' • • a — 2), 

(e) K6 _i 855 1) Xb = 1, s:^—jj ~ —2, x n „i — 1 ,$$ — 1 (<j — 5, 6,. * * • ftj & * 2, 3, 

• * ■ ft — 3), 

(f) Sd-I - Xc-i - X b -1 = X a „x » 1 ; Xd Xc = $b ~ Xa » - 1 (ft = 8 , 9 , * ■ * ft? 

b = 6,7, *. * • ft — 2j c ~ 4,5, - * • b — 2 } d — 2,3, ■ * • c — 2) 

Frequencies for each of these types must be considered separately. The 

■n u . — . II I I III I' Iii ■ 

1 Loo. olt. p. 36, 




SUMS OF SQUAltES OF RANK DIFFERENCES 


141 


method of evaluation will bo illustrated for type (f), since this type yields the 
polynomial of highest degree. It is apparent that the required frequency is 

obtained by computing £ l £ l £ f £ ill). It can be verified that the re¬ 
sult is 

{n — 4) w __ [n — 4 ){n - 5 ){n - Q){n - 7) 

4! 24 

The total of (a), (b), (c), (d), (e), and (f) is 

(n - 2) + 2(n - 3)® + 

For = 10, the result seems to be 

1 

2(» - 3) + (n - 3)® + (n ~ 4) (>l + — 

For sums greater than 8 the method becomes quite uninviting, not only 
because of the intricacy of the necessary analysis, but also because of the 
opportunities for mechanical errors and the absence of satisfactory checks. 
Besides, if the exact distribution for a particular value of n is desired, \yc need 

n 3 — n 

expressions for 2d 2 = 0, 2, 4, • • • —~— — 2. For n as small as 8, this means 

the requirement of 42 formulas. It is fairly evident that these formulas will 
comprise polynomials ranging in degree from 0 to 41, 

III, APPROXIMATIONS 

Since the exact distributions of sums of squares are not easily obtained, we 
next consider the problem of finding approximations for them. Hotelling and 
Pabst 8 have given a method of deriving the even moments of the distribution of 
r' } (the odd moments being zero), and have recorded the values of the second and 
fourth moments. They have also remarked that the kurtosis, & = jn/Ma, 
approaches 3 and that the distribution of r* approaches normality as n ap¬ 
proaches infinity. These are valuable and interesting results. Because of 
them the normal curve suggests itself as an approximating function. Its use 
has been considered a little later in this investigation. 

But a distribution with a finite range causes trouble at the tails when a normal 
fit is attempted, and, for this problem, we are particularly interested in the tails. 
It seems more feasible to attempt an approximation with the Pearson type II 

x 2 \ m 

1--) , This has the advantage of a finite range and three 

a 2 / 


curve, y ** y Q 




4 Loo. cits, p. 32 et &eq. 



142 


E. G. OLDS 


constants to be determined. The values of these constants, as given by Elder- 
ton 7 are 

5ft - 9 * 2 msj 3 2 , „ _ WXr(2m+2) 

m ~ 2(3 - &)’ 3 - h' ".X 2*” +l X [r (m + l)] 5 11 

(where N is the total frequency). 

If we use this distribution to approximate the distribution of sums of squares, 
it proves convenient to define x as equal to one-half the deviation of 2d" from its 
mean, i.e., 

2d 2 r? ~ n 
* = 2 12 

Then the relative frequency of 2d 2 = k is approximated by 

j ^ f(x) dx ± /( x,) where x t ^- 

(Of course, closer approximations may be obtained, if desired). The approxi¬ 
mation used is clear if we remember that only even values of k are possible and 

j • 

that the range is now —~—. 

o 

The moments for x are now obtained from the moments for r f by multiplying 

_ ft 

by the proper powers of ——r—, We have 

1 JU 

- o. - 

The value of ft is unchanged. For r’ or x it is 

A 3(25tt‘ - 13ti' - 73;t a + 37re + 72) 

2 2 bn{n -f l) 2 (n — 1) 

For ft*#5,pir 25, ft ~ 2.0720, N *= 51. Using these values and equations 

(8), we obtain a = 10.566, m =s .73276, yc = 7,8545. The approximating 

/ 2 \.ttm 

function is y =* 7,8545( 1 - J In table II the computed values of y 

and the true frequencies are listed for comparison. 

When testing the significance of a particular value of 2d 2 our principal interest 
is ill the probability that Sd 2 < k , rather than in the probability that 2 d* *= k. 
The probability that Sd 2 < fc requires cumulation of frequencies, followed by 
division by the total frequency, If results, given in table II, are compared it is 
noticed that the maximum error in using the type II function is .0194 and the 
average error is .0072. Comparisons for other values of n are given in table III. 

? Elderttm, W, P., Frequency Curves and Correlation, Layton, London, 2nd ed., 1927,. 
p. 84, 



TABLE II 


Comparison of exact and approximate frequencies for n = 6 

^Approximations obtained by computing ordinates of 

( T a \ .maX 

'-msO ) 


2d 1 

Frequencies 

Cumulative (expressed 
ns percent of 120) | 

Difference of cumulatives 

Exact 1 

Approxi¬ 

mate 

Exact 

Approxi¬ 

mate 


1 


.0083 

.0125 

-.0042 

2 

4 


.0417 

.0378 ' 

+ .0039 

4 

3 

4.21 

.0667 

.0729 1 

-.0062 

6 

6 i 

5.14 

.1167 

.1157 

+ .0010 

8 

7 j 

5.91 

.1750 

.1650 | 

+ .0100 

10 

6 

6,52 

.2250 

.2193 

+ .0057 

12 

4 1 

7.01 

.2583 

.2777 

-.0194 

14 

10 i 

7.39 

.3417 

,3393 

+ .0024 

16 

6 | 

7.65 

.3917 

.4031 

-,0114 

18 | 

10 

7.80 

.4750 

.4681 

1 +.0069 

20 1 

6 

7.85 

,5250 

.5335 . 

-.0085 

average of abso¬ 
lute values = .0072 


TABLE III 


Approximating functions, with errors involved 


5 

; 

Approximating functions 

Average and maximum absolute values 
of differences of cumulatives 

Type H 

Normal 

! 

Exact— 
type II 

L Exact- 
normal 

r . 

| Type II- 
normal 


, 51 *? 

-y—^ e 50 
s/SOtt 

| . .. 

.0072 .0194 

1 

1,0200 .0415 

1 

1.0210 .0357 

6 

, / „2 \1.3U6 

31.652 1 1 - ) 

V 351.75/ 

6J 

1 —- e 122,5 

\Z122.Btt 

.0030 .0126 

.0131 .0273 

.0136 .0270 

7 

( Jl \'2.0160 

158.33 1 - -A— ) 

V 918.84/ 

7\ -JL. 

— r ^. -e 261.83 

■\/201,33jr 

.0017 .0067 

.0106 .0221 

.0108 .0209 

8 

/ -2 \ 2.6836 

81 

——- e w* 
\ZEofc 



.0086 .0175 

9 

/ t 2 \ 3.3141) 

6276,3 [ 1-— 1 

V 4332.6/ 

91 

- -e boo 

V900fr 





/ t 2 \3.9BBB 

64515 

\ 8266.0/ 

10! —*L- 

1 - - e 1612,8 

\Zl512.fi*- 



* 


143 
























m 


ft. G. OLDS 


It would be very convenient if the cumulative frequencies could be approxi¬ 
mated by the use of normal curves. In tabic III are listed the proper normal 
curves, along with comparisons with the exact values and with the values 
obtained from the type II curves. For the values of n investigated the normal 
cum is not as satisfactory os the type II. This, of course, is to be expected 
because of the lack of agreement between the fourth moment of the normal curve 
and of the exact distribution. However, in view of the fact that, for values of n 
investigated, the maximum and average errors decrease as n increases, it seems 
satisfactory to sacrifice accuracy to expedience and use the normal curve as an 
approximating function for eases of n greater than 10. This has been done in 
constructing table V, In further justification it might, be noted that , which 
approaches 3 as n approaches infinity is an increasing function of n for n greater 
than 3. 


IV. TABLES to test the significance of the hank correlation coefficient, 

WITH EXAMPLES OF THEIR USE 


Table IV gives the probability that, for any given value of n and a computed 
value of 2d 3 less than or equal to the mean, the value will not be exceeded by 
chance. For a value of 2d % greater than or equal to the mean, it gives the 
probability that the value will be equalled or exceeded. The values for n, = 
2, 3, 4, 5, 6, 7 are computed from exact frequencies; those for n « 8, 9, 10 am 
computed from type II curves. 

Table V is constructed by the use of normal curves. It gives the limits of 
2d 2 for a few of the more useful probabilities. 

It seems desirable to explain why values of 2d 2 were tabled rather than values 
of v 1 . It was done for two reasons *. first, to avoid the difficulties arising from 
discrete variates; and, second, because the tables seem more useful in the form 
given since the labor of completing the calculation of r' can be avoided if the 
computed value of 2d 2 tests as not significant. 

Example 1, Seven individuals are ranked by two criteria, as indicated below. 
Are the results significantly alike? 


A 1 2 3 4 6 

B 2 1 6 3 4 

d 1-1 3-1-1 

£ 119 11 


6 7 

7 5 

1 -2 / 0 

1 4/18 


Solution: Bows3 and 4 give the differences and squared differences, respectively. 
If we enter table IV with n - 7 and 2d* = 18, we find P ^ .0548, so we would 
expect that a value as small as 18 would occur by chance more than 5% of the 
time. This does not usually indicate significance so it is useless to compute the 
value of r'. It is interesting to notice that r' actually does 

1 - 


.68 and that, if we had used the formula, a T > = 1.0471 


( 


prove to be equal to 
r A \ 

i-~J we might have 


% 



SUMS OF SQUARES OF RANK DIFFERENCES 


145 


TABLE IV 

The probability that 2d 2 > S for & > 2 or that 2ci s < S /or £ < £ ^ (xu/iere 2 ^ 

represent wean value of sum of squares) 


1 

AT- 2 

3 

4 

5 

6 

7 

8 

0 

10 

hi 

1 

4 

10 


35 

56 

81 

120 

165 

S ' 










0 

■M 

.1667 

.0417 

,0083 



■n 


.0000 

2 

* 

m 

.1667 

.0417 



i 

1 

.0001 

4 



.2083 

.0667 

.0167 



m 

.0001 

6 




.1167 

.0292 


■H 

Hffil 

.0001 

8 


.1667 

.4583 

.1760 

,0514 


1 

1 ' B 

,0002 

10 



.6417 

.2250 

.0681 



mmm 

! .0003 

32 



.4583 

.2583 

.0875 

.0240 

.0059 

.0015 

.0004 

14 



I 

.3417 

.1208 

,0331 

.0081 

.0020 

.0005 

16 




.3917 

.1486 

.0440 

.0108 

.0027 

,0007 

18 



.1667 

.4780 

,1778 

,0548 

,0141 

■Kg 

mm 

20 



.0417 

.5250 

.2097 

,0694 

.0179 


1 

22 




.4750 

.2486 

.0833 

.0224 

.0057 


24 




.3917 

.2819 

.1000 

EH 


.0018 

26 




.3417 

.3292 

.1179 

HI 

1 

.0022 

28 




.2583 

.3569 

.1333 

mm 


.0027 

30 j 




.2250 

.4014 

.1512 

.0469 

.0127 

.0032 

32 




.1750 

.4597 

.1768 

.0550 

.0152 

.0039 

34 




.1167 

,5000 

,1978 

.0639 

,0179 

.0046 

36 






,2222 

EH 

mm 


38 





.4597 

.2488 




40 





mm 

,2780 

EH 

mi 


42 





,3569 

.2974 

.1078 

.0323 

.0086 

44 





.3292 


.1207 

.0368 

.0100 

46 





,2819 

.3565 

.1345 

.0417 

.0114 

48 





,2486 

.3913 

,1491 

.0470 

.0130 

50 






.4198 

.1645 

.0528 

.0148 

52 





.1778 

.4532 

.1806 

,0589 

.0168 

54 





.1486 

.4817 

.1974 

.0656 

.0189 

56 





,1208 

,5183 

,2150 

.0726 

,0213. 

58 





,0875 

.4817 

,2332 

.0803 

.0287 


























































1 


SUMS OF SQUARES OF RANK DIFFERENCES . 147 


TABLE IV—Concluded 


AT 

8 

9 

10 

N 

8 

9 

10 

S M 

84 

120 

165 

1 

ht 

84 i 

120 

165 

S 


1 

S 




126 

.1078 

.4568 

.2580 



.1459 

.4865 

128 

,0956 

.4397 

.2694 



,1351 

.4731 

B [ ■ 

.0841 

; .4226 

.2810 

172 


.1248 

.4596 

132 



1 .2928 

174 


.1149 

.4462 

134 


.3888 i 

.3048 

176 



.4328 

136 


.3721 

.3169 

178 



.4196 

138 

1 .0469 

.3557 

.3293 




.4063 


.0396 

.3394 

,3418 

wSfim 


1 1 

.3931 

142 

.0331 

.3234 

! .3545 

184 


BilS 

| .3802 

144 

,0275 

! .3077 

! .3073 

. 186 


.0656 

.3673 

146 

.0224 

.2922 

.3802 

188 


.0589 

.3545 

148 

.0179 

.2770 

,3932 

190 


.0528 

.3418 

150 

HR! 

.2622 

.4063 

1 


,0470 

1 .3293 

152 

HfflS 

.2477 

,4196 

■pa 


.0417 

j .3169 

154 

I 

.2336 

.4328 

nil 


.0368 

.3048 

156 

.0059 

.2198 

,4462 

206 i 



.2928 

158 

.0042 

.2065 

.4596 

208 i 



1 ,2810 

160 

.0028 

. 1935 

.4731 



1 1 

.2694 

102 

.0018 

.1809 

.4865 

212 


.0210 

1 .2680 

164 

.0011 

.1688 

.5000 

214 


.0179 

.2468 

166 

.0006 

.1571 

.5000 

216 


.0152 

.2358 


(Tables for cases 9 and 10 can be completed by symmetry.) 


judged the value of r‘ significant, since <jy « .213, and .213 is less than one- 
third of ,68. 

Example 2. Six golfers found, upon ranking their scores and also ranking 
their respective amounts of sleep for the previous night, that the two orders 
were the reverse of one another except that the two ranking 1,2 in sleep ranked 5, 
6 in score, Is the negative correlation too great to be reasonably attributed to 
chance? 

Solution: We find 2d 2 = 68 and, upon consulting table IV, P = .0083, so we 
conclude that more sleep might mean fewer strokes. 

Example 3. Before an examination a teacher ranked his class of 13 members. 






































148 


E, £f, OLDS 


After the examination, he found that the sum of the squares of the deviations of 
rank on examination from rank estimated was 144, Should he consider the 
agreement satisfactory? 


TABLE V 


Pam of values between which 2d 2 ha s a probability, P, of being included 


N 

p ---- 

.09 

.98 

.06 

_._._ 

.00 

.80 

11 

40,8 

309.2 

68.2 

381.8 

77.1 

362.9 

105,6 

334.4 

130.8 

309.2 

12 

60,0 

505.1 

82.4 

483.6 


400.1 

HI. 2 

424.8 

172.5 

303.6 

IS 

93.3 

634.7 

119.6 

608.4 

148,2 

679.8 ^ 

191.2 


229.3 

408.7 

14 

125.9 

780.1 

161.4 

748.6 


714.2 

247.4 


293.3 

616.7 

15 

174.5 

946.5 

211.8 

008.2 

252.6 

867.4 

313.8 

EsH 

308.2 

751.8 

16 

227.8 

1132.2 

271.6 

1088.4 


mm 

391.2 


4G5.0 

006.0 

17 

290,5 

1341.4 

341.4 

1290.6 


HU 


1161.6 

564.6 

1077.4 

18 

303.6 

1574.4 

422.3 

1616.7 

488.3 

1451.7 

582.4 


667.8 

1270.2 

10 

447.9 

1832.1 

514.9 

1766.1 

588.2 

1601.8 

098.0 

1582,0 


1484,4 

20 , 

544.1 

: 2115.9 i 

620.2 

2039.8 

703.4 

1956,6 

828.1 

1831.9 


1721,0 

21 

653.0 



2341.1 

832.8 

2247.2 

973.6 

2106.4 

■ m 

1981,3 

22 

776.5 

2766.5 


2670.0 


2564.7 

1135,3 

2406.7 

1276,7 

2266.3 

n 

912.5 

3135.6 

1020.2 

3027.& 

1137.8 

2910.2 

1314.2 

2733.8 

1471,0 

2577,0 

24 


3535.3 

1181.3 

3416.7 

1316.1 

3284.0 

1511.1 

3088.9 

1686.4 

2914.0 

25 


3907.0 

1365.4 

3834.0 

1610.1 

3680.0 


3473.0 

1919,8 

3280,2 

26 

1418.2 

4431.8 

1564.1 

4285.9 

1723,6 

4126.4 

1962.7 

3887.3 

2176.3 

3674,7 

27 

1621.1 

4930,9 

1781,6 


1066.6 

4505,6 

2219.2 

4332.8 

2452.6 

4099.4 

28 

1842.7 

6405.3 

2018.1 


2200.8 

5008,2 

2497.3 

4810.7 

2752,8 

4665,2 

29 

2083.7 

6036.3 

2275.1 

5744,9 

2484,3 

5635,7 

2767.9 

6322.1 

3076.7 

5043,3 

30 

2346.0 

6646.0 

2553.2 

6436.2 

2780.8 

6209.2 

sm.o 

5B68.0 

3426.2 

5664.8 


Solution: Entering table V with n = 13 we see that P « ,96 for a value between 
148,2 and 579.8, and that P = .98 for a value between 119,6 and 608,4. There¬ 
fore the probability of not exceeding 144 by chance is between .02 and .01. It 
would seem that the teacher showed considerable knowledge of his class. 


Cahneche Institute of Technology, 


















NOTE ON CORRELATIONS 
By D. B. De Lury 


When the value of a correlation coefficient is to be estimated from a set of 
N pairs of observations, (au, yt), i - 1, 2, - • * N } the statistic ordinarily com' 
puted is, of course, the product-moment correlation coefficient, 

r - Sia/fosO, where 

N V N 

ns\ = Z fc - $)\ n$\ y)\ ns u « Z fe - $){yt - y) t 

{=>1 i»l i“l 

1V N 

Ni B=Z$»> Ny-YiVi, n~N-t 

i-1 i« j 

However, when x and y are known to have the same population mean and 
variance, the precision of the estimate may be improved slightly by using the 
intraclass correlation coefficient, 

2 Z fe “ £) (yi "* f) a 

r l - -jf— -- f m - Z fe + y<l 

Yl(x<~Z?+(Vi~i;) 2 } 


It may be of interest to inquire into the properties of an analogous coefficient, 
appropriate to the case of equal variances and different means. This coeffi¬ 
cient would naturally be chosen to be 

u - 2sis/(si 4 $ 2 ) ** j2Mz/(&i 4* s?)} 

Obviously, | u | < | r |. 

The probability distribution of u is easily determined, under the assumption 
that x and y obey a bivariate normal distribution. If ? is their common vari¬ 
ance, no restriction is introduced by taking <7 = 1. Then the probability ele¬ 
ment of si, s s , r, is known to be 1 


*(n-2)1(1-p*) 1 * 




(#J-2pr«iS2+aj) 


(1 - r 2 ) * dsidsidr, 


where p is the correlation of a; and y, From this, the distribution of u can 
be obtained by making the transformation 

u = {2siSs/(5i -f* 4)}/', i ) = 2siS2/(§i -f" ^2)) w = Si 4* • 


l R, A.Fishfer, Biometrika, Vol, 10, p, 510. 

149 



150 


D, B, DI3 LXISY 


Under this transformation, the range of $ 1 , 52 , r, determined by the inequalities 
0 < Si < co, i = 1, % — 1 < r < 1, is mapped in a two-fold manner upon the 
space <u <$, 0<v<l, 0<w< «. For fixed u, v ranges from u to 1 
or from - u to 1, according as u is positive or negative, and w runs from 0 to «, 
The probability element of u, v, w, is found to be 


m n 


{v 2 — u z ) w" - ” 1 {I P!,, ' £ ’ dudvdw, 


TrCn - s)!(l - P 3 ) ,,/2 


and the distribution of u, obtained by integrating with respect to a and w, is 


fC(l - p 2 ) nl %\ - pur<l - n 2 ) {n ^ m d%, 



If p *= 0, the distribution of u is identical with that of v, the product-moment 
correlation coefficient (for p = 0), in samples of {N 4- 1) pairs of observations. 
Therefore, to test the hypothesis of independence, using the coefficient u, the 
methods and tables appropriate to testing the same hypothesis, using the coeffi¬ 
cient f, are available. The precision gained by using u rather than r is equiva¬ 
lent to that supplied by another pair of observations. 

In the general case, the transformation introduced by R. A. Fisher, 2 

u = tanh z, p = tanh 

leads to the distribution element 3 ' 

K sech"^ - £)dz. 

This distribution is invariant in form under varying £, and is effectively normal 
for samples of any size. In all cases, a is an unbiased estimate of $*. 

The variance of z can be obtained by the following device. Denote by 1 (2p, n) 
the 2 p-th moment of % about the mean, 

$ Zp sech" x the. 


Integration by parts gives the recurrence formula, 

^ (2^4 l7(2p~-F~2) + 2,r, ' y " n -b 2)}, P ^ °* 

4 Metron, Vol. 1, N. 4, p. 7. 

3 The distributions of uand« for n ** 1 have been given by R. A. Fisher, Metron, Vol. 
1, N. 4, p. 8, 



NOTE ON CORRELATIONS 


151 


From this follows at once the relation 


/(2p + 2, n + 2) — /(2p 4* 2,1) 
-(2p.+ l)(2p + 2) 

* !(2p 4" 2,2) 


/(2p, 1) r /(2p, 3) 


l 2 


4" 


3 2 


*J» f a > + 


/(2p,ft)' 


n z 


>, Ti odd, 


(2p + l)(2p + 2) + • • • + n 


even. 


The values of I( 2p 4- 2, 1) and /(2p + 2, 2) can be found without evaluating 
the integrals, by letting n «. It can be shown that J{2p, n) - 0(« -11 ), 
and hence lim I( 2p, ft) - 0 for p > 0. We obtain 


n^+atj 


I(2p + 2,1) = (2p + l)(2p + 2) + . ■ •}, 

r(2p + 2,2) = (2p + l)(2p + 2) + . • 

Hence, for all values of n and p, (replacing n + 2 by ft), 

I( 2p 4- 2, ft) 

- (2p + l)(2p + 2) + ^-'4 r + ^ 4^- +•••)• 


w 


Setting p = 0 to get the variance, 
1(2, n) = 2(i + 


Hi 


(ft 4- 2) a 
1 , 1 


(n + 4) 1 


4. 


ft* 1 (n 4 - 2 ) a ' (ft4-4) a 


4- 


r 

Too 

/ 

jin-l 


we find 


Therefore, making use of the fact that f nT 2 dx < £ f' 1 < 

Jm i~*m 

that 

1/ft < ^2 < l/(ft - 2), 

and from the numerical values of ps for small values of ft, it appears that the 
approximation p 2 ^ l/(ft — 1) is satisfactory in all cases* 

In the same way, it can be shown that 

3/n* < n, < 3/(» - 2)’. 


Thus the method of transforming correlations to test for significance, used 
by R. A. Fisher in connection with both interclass and intraclass correlations, 
is available here also, and is, in fact, slightly simpler, owing to the absence of 
bias. 

The coefficient u can, of course, be used in all situations where the intraelaas 
coefficient is appropriate, (when the number of observations in each class is 
two), and conceivably in a small class of other cases as well. The test of signifi¬ 
cance is simpler using u instead of r f , and the loss of precision is negligible. 

University op Toronto, 




INTERIOR AND EXTERIOR MEANS OBTAINED BY THE 
METHOD OF MOMENTS 

By Edward L Dodd 

1. Introduction—The Substitutive Mean. A very general mean based upon 
substitution was proposed by 0. Chisini, 1 Briefly stated, this mean M of 
data is ,•'*,&», is a number which satisfies some equation of the form 

(1) G(Jlf t M t * • •, M) » 0(xi , & , * * , a*). 

If, now, 5 

(2) Af *= F(%\, , • ' ,s») 

is an explicit expression of M, then for each value c which each of the arguments 
m can take on, 

(3) Ffa c, • •, o) * c; x 

■) 

or at least one value of this F is c, 

Suppose now that F(x i , , jc«) is my function of a*, is, •»», x *, 

defined for at least one set of equal arguments c } and suoh that whenever de¬ 
fined for equal arguments c, at least one value of F(c, c, ,c) = c, Such a 
function, I have called a substitutive mean Various extensions* are immediate, 
such as the use of integration in place of summation. Indeed, point set func¬ 
tions or functionals may be used. 1 Here I shall supplement (3) by a fairly 
common convention. If F{ c, c, • *, c) b not originally defined, but as -* c 
simultaneously, limit Ffa, &, * , $ n ) =* c, m this case, F(o, c ,. •., c) will 
be assigned its limiting value c,—thus establishing continuity. 

2. Location and scale as means. The purpose of this paper is to investigate 
the nature of the means which arise when the well known Method of Moments 
is used to estimate the values of two important parameters—namely, the loca¬ 
tion k and the scale a of a frequency function or distribution These are taken 
as associated with the variable x of the distribution thus: 

(4) * - *)/«. 

1 0 Chisuu, "Sul Cono&tto di media," Penoim di Malmaiico, Series 4, Yol. 0, (1929), 

pp. 106-118 

'EL Dodd, "Internal and External Means Arising fiom the Scaling of Frequency 
Functions/ 1 These Armcb, Vol B, (1937), pp 18-20 
1 For an extension of Chismi’s results, Bee Bruno do Flnotti, "Sul Concetto dl media," 
Gionmk ddl' Itufttulo Itahmo degti Ativan, Yol 2, (1031), pp. 369-399. 

1 E L. Dodd, "The Chief Characteristic of Statistical Means/' Cowles Commission 
lecture Colorado College Publication, General Series No 208, (1036), pp 80-02 

163 



154 


TOWARD L. DODD 


The nature of the distribution is then “specified” by 

(5) y =• 

where $ may contain other parameters, but in 4 the * and a appear only in the 
i f given by (4). For this mode of approach, the reader is referred to R, A 
Fisher. 6 

The other parameters which may appear in $ will not be considered in this 
paper, 

The parameters k and a are in general unknown and unknowable. However, 
we attempt to get close estimate a k and a, of k and a respectively, from a set of 
observations 

(6) , Xi , * n , X n . 

To accomplish this, we have to solve certain equations formed m some way from 

(7) t~(x- k)/a, 

and 

(8) v - 

These equations (7) and (8) result from (4) and (6) by substituting estimates 
k and a, respectively, for parameters k and a 
Now the Method of Moments equates the theoretio moments—those obtained 
from some such equation as (8) with t replaced by its value in (7)—to the 
moments obtained from the observation (0). 

For the following discussion it will be useful to obtain " auxiliary moments 
from the 4(f) in (8) before substitution is made from (7). Such moments, then, 
do not depend at all upon the values ultimately assigned to k and a. It is sup¬ 
posed that 


j 4(t) dt« 1, 


so that 4(0 gives probability or relative frequency Here, for finite distribu¬ 
tions, 4(f) a 0 outside the interval of the distribution We shall assume the 
existence of the first moment 


and of the variance 


„«* / «(«)(&, 

r (f - „)’*(0 dt~ f“ e*{t) dt-S; 


and we shall assume that u > 0, to eliminate a degenerate case 


1 * R A Fisher, “On tlio Mathematical Foundation of Theoretical Statistics’ 1 Philosoph¬ 
ical Trnnqaciiont of tho Royal Society of London, Senes A, Vol 322, (1031), pp. 300-308, 



MEANS OBTAINED BY METHOD OF MOMENTS 


155 


For the empirical moments of (6), we write 

(12) X = (Z, + Zj + • ■ • + Z n )/n — XX{/n, 

(13) S - 2X',/n - 2(X, - X)'/n + **. 

These two moments are, by the Method of Moments, equated respectively to 

(14) Mi =» i j a^[(a; — k)/a] da:, 

(16) , M t =* - f x a $[(x - k)/rt] dx 

a 7-w • 

But, from (7), (10), and (11) it is easy to see that 

( 16 ) Mi — ]l an, 

(17) Mi ~ k a + 2ko#x + a*(<r s + /**); 
from which it follows that 

(18) Mi- M]^ aV 
Suppose now that r 1 is the empirical variance, 

(16) r* - 2 (X t - Xf/n, 

It follows from (12) and (13), that if Mi ** X and Mg - S—as the Method of 
Moments requires—then 

(20) a 9 - r“/V S - 
And, from (16), 

(21) k = X - ajt* 

These results may be expressed m the following theorem, 

Theobem 1, The estimated scale a in 

(8) y - a"^(0, 

where by (7), t — (x — k)/a, os obtained by the Method of Moments from observa- 
Hone Xi f X% } ' • , X n ,is the root~mean square of | X< — X j /a, where X is the 
arithmetic mean of the Xfs, and <r a is the "theoretic 1 * variance of $(t) itself, as a 
function of t—mth no reference to the k or a in (7), 

Moreover, the estimated location k is a substitutive mean, characterized by 
(3), and given by 

(21) k = X - Aii = X - [scar, - X)*/Af. 

As regards this final statement, it will be seen that if each Xt - a t then X = o; 
and hence k * c,—os required by (3), We may say, then, that the right mem¬ 
ber of (21) obtained as the formal solution of equations winch the problem sets 
up, is a substitutive mean of the elements Xt , 



156 


EDWARD L DODD 


But if each Xj « c, then a = 0 in (21)} and this a may not be used as a scale 
However, if any two Xfs are different, a p* 0 And it is evident that as X< c 
simultaneously, limit k = c If, then, we consider that the right member of (21) 
is not originally defined for equal values c of the elements X{, it is to he given 
its "continuity” value c, in accordance with the common convention already 
mentioned* 

In the Bpeoial case where the function $ in (5) chosen to specify the distribu¬ 
tion has a first moment u equal to zero, the estimate k of location given by (21) 
is seen at once to be the arithmetic mean of the observations Xi, Xa, • • ,, X„ 

3 External means. In the papers cited, Chisini and DcFinetti gave ex¬ 
amples of external means Indeed, it is not difficult to find means which do not 
conform to the condition of intern&lity; 

(22) Minimum (X#) £ Mean (Xi) ^ Maximum (X<), 

As a simple illustration, suppose that there are just three measurements Xi = 1, 
X a = 1, X 6 - '2 The standard deviation V5 is greater than each measure¬ 
ment—it is an external mean. In this cose also, the estimate of scale mentioned 
in Theorem I is an external mean of (Xt - X) /<r But, it may be noted that a, 
the estimate of scale, is an internal mean of | X# - X | fa 

However, it will be shown now that the estimate k of location may be an 
external mean, with an externality not 11 removable** by the simple device of using 
absolute values. 

And it may be noted that in the earlier paper cited, I found by the Method of 
Maximum Likelihood estimates of the scale a, which were likewise not removable 
Theorem II. 7/ for the function $(Q in (8), the second moment is less than twice 
the square of ike second moment , then the estimated location given by 

(21) k “ X - ap 

is an external mean of the measurements Xt , if these are oil numerically equal , 
half of them positive and the other half negative. 

Proof Let the positive measurements be c, end the negative measurements 
be —<s. Then X » 0; also in (10), r » c Hence from (20), a * c/V But by 
hypothesis, the second moment a + p of $(Q in (11) is less than 2p*, and thus 
|p/«r] > 1, Then, by (21) k 1=5 X — aju *= (-c/ff)n; and hence |k| > e. 
Either k is greater than every positive measurement e, or it is Icbb than every 
negative measurement -c. In either case, it is an external mean 

Corollary If tn $(0, the l is subjected to a translation t = u + b, so that 
4'(<) = 4>(u + b) = '£(«), then it w always possible to choose b so that the second 
moment of ^(«) is less than twice the square of its first moment; and thus if a 
location V! u obtained from ¥{u), external means may occur On the other hand, 
by proper choice of 5, it is possible to make the first moment zero, so that the loca¬ 
tion becomes the arithmetic mean X of the Xfs 

The first part of this corollary may be seen from (11) which Btates that Second 



MEANS OBTAINED BY METHOD OF MOMENTS 


167 


Moment = p + a 2 . Translation does not change a*, but it can increase p? 
indefinitely,—making eventually jt* > <r* and thus /i a -f < 2ft 

4 Illustration. For the Pearson Type III the simplest specification is per¬ 
haps with the origin at the start In this case, 

(23) $(fl = (p/T l e~ l f, v > -1, t 2; 0, t = (as - k)/a. 

Here p! ™ r(l -j- p) Apart from this numerical factor, $(£) is the integrand 
of the Gamma function. With $(l) m this form, it is easily seen that the first 
moment is (p + 1) and the second moment is (p + l)(p + 2) In the usual 9 
case, p > 0. Here, then, the second moment ia less than twice the equate of the 
first moment. If, then, there are an even number of measurements, all numer¬ 
ically equal, with half thB measurements positive and the other half negative, 
then the estimate k of location as found by the Method of Moments is on ex¬ 
ternal mean of the measurements. Such conditions, while sufficient, are by 
no means necessaiy for externality 

6. Summary, Suppose that the specification for a frequency function in x 
is cT^(i') t where t' = (x — *)/«, and that for the unknown scale a and location 
k, estimates a and k, respectively, are made by the Method of Moments from 
a set of ft measurements Xi with arithmetic mean X. Let a* be the varianoe of 
Then the estimate a is the root-mean-square of | Xi — X 1 /v, an internal 
mean. The estimate k of the location is X — n&, where p is the first moment 
of <J(0 This is a substitutive mean of the measurements X { ; and it may be 
external—either greater than Maximum Xi or less than Minimum Xi . 

Tub Uniybrsitt of Texab 

* W. Palin Elder ton, Frequency Curve* and Correlation, Second Edition, p 91 



ON THE CHI-SQUARE DISTRIBUTION FOR SMALL SAMPLES 1 

By Pm G> Horn, 

1, Introduction. The use of what is known as the x* distribution function 
for testing goodnfess of fit involves two types of error. One arises from the fast 
that the derivation of this function is based upon rough approximations, while 
the other arises from using the integral of this continuous function in place of 
summing the proper terms of a discrete set. Both of those errors become in¬ 
creasingly important as the sample becomes small The purpose of this paper 
is to investigate the nature of this first type of error by finding a bettor approxi¬ 
mation than the customary one to what might be called the exact continuous x* 
distribution function. 

The method employed is that of generating or characteriatio functions, and 
consists in expressing successively in expanded form the generating function of 
the multinomial, the distribution function of the multinomial, the generating 
function of %, and the distribution function of x Only the first and second 
order terms of this final distribution function are evaluated explicitly because 
of the increasingly heavy algebra involved. By means of these second order 
terms, the nature of the error involved in the use of the customary first order 
approximation is investigated. 


2 The Generating Function of the Multinomial. Consider k -f 1 cells 
mto which observations oan Ml, and let pi be the probability that an observa¬ 
tion will fall in cell %. If n observations are made, the probability that cell i 
will contain at of these observations is given by the multinomial 


nl 


ad tfjl * * 


vi 'fP-fWS 


Hi 


where £ a* = n The generating function of this multinomial can be writ- 

i-] 

ten as* 

M «If i." + • • + pie'* + p|* - [l + jE V <(«" - 1)1, 
where ajt+i 1 b chosen as the dependent variable and p*+i is written as p. 


1 Presented to the Amenoen Mathematical Society, April 0,1038. 

1 Cf Dwmois, StatisliqMe Malhmatiquo, pp. 207-242, for the methods used in this and 
tho nut Wo pwngrnplj* 


168 



CHI-SQUAEE DISTBIBBTION BOR SMALL SAMPLES 


160 


/ 


Let x< = a - The generating fimotion of the sdj is obtained from that 

of the a ( by multiplying M by the proper factor to shift the origin to the mean 
and then replacing U by, say, u*/Vn to compensate for the change in scale 
Denoting this function by <p t 

_ B j EiHJ r k ’ -u 

<fi = e ^^ll + gptCc^-DJ . 

Consequently, 

log <p = - V» E P<u< + n log [l + E ptCs"*^ - 1)1 

Since the range of the u< may be seleoted sufficiently small for convergence, the 
logarithm on the right may be expanded in powers of the summation, which in 
turn may be expanded in powers of the u< Terms containing lfif q as a factor 
will be homogeneous in the u< of degree q + 2. Writing down only the terms 
of order 1/n and lower, this double expansion gives 

log v> ■ gTZ (p( “ p!)u? - 2 E piP/Urtyl 
2Li-l «f J 

+ 4F \ £ fa " + 2 ?s)u® -It (ptPt - 2p}p#)ui^ 

+ 2 T pjp;piU{U/Uil + i [i E (Vi “ 7p! + 12p! - dp{)u| 
i<i<t J n |_24 

(1) - s E (ptPr - dp?p, + to<P/)u!u/ 

u Mf 


- 7 E (PiPi - 2p5p/ - 2pfP* + ftp!p’)u*uj 

*i<1 

+ E (PtPiPl - 3pJp;pi)u!uyUi 

«<j<| 

i<t<i 

+ 6 E P<PiPlP*UfUjUlU* + . 

/</<!<» J 


Hence <p can be written in the form 


( 2 ) 





Ai i i 

V “ "T I 

ft ft 



J 


where A\ is the coefficient of l/y^ti in (1), At is the sum of the coefficient of 1/n 
and Aj/2, etc 



160 


PAUL 0. HOBL 


3 The Multinomial Distribution Function. If a distribution function can 
be expressed as 

( 3 ) fa, + 

where fo is of the form with | c i{ | positive definite, then its generating 

function 8 can be written &b 


(4) 



* i 


u*) * F fi 



h 

«iUi + £ PijUiUi + 


] 


where Fo is the generating function of /o and is of the form <s 




with | a a | 


positive definite. Conversely/ if the generating function of a continuous dis¬ 
tribution is of the form (4), then the distribution function oan be expressed by 
means of (3), This relationship may be applied to (2) since it oan be shown to 
be of the form (4). 

The coefficients C| f of corresponding to ip can bo determined by making use 
of the faat that the moments of /o can be evaluated directly by integration or 
indirectly by differentiation of <po It is sufficient here to equate expressions 
for second moments) thus 



flu,flut 


IM**Q 


j • j d$i 


dxu 


Now 


3* flo = 

flu*flu* uf -0 \p* Pi i 8 = t 


The value of the integral is known 8 to be c tl , the reciprocal of the element c* t 
in the determinant | on \. Hence 1 

gM so J — P*?hl 4 ^ f 


But Cti can be obtained from c <( , ance it is given by the reciprocal of c l Thus 
Cj* = 6*71 o'* |, where t li denotes the cofactor of element c ,( in | c* 1 1. 


5 DarmoiBj lac, oit., p, 342, 

4 Sec, for example, S Kullbaok, Annals of Mathematical Statistics, vol fl (10B4), pp 
263 - 307 * 

* Bee, for example, Rlsser and Traynard, Los Principes de la Slatislique Malhemalique, 

p, 226. 



CHIH3QUAEIS DISTRIBUTION FOR SMALL 8AMPLEB 


161 


pi “ Pi ~PiPi • ~pip* 

ft “ P> • -JkPb 


Pb - Pk\ 


1 -- 1 

Pi 


= (~i) fc p!p! • • • pi 


l ~i 

P* 


1 -I 

ft I 

This determinant may be evaluated by subtracting the last column from each 
of the others and then expanding by minors of the last row. Thus 


|c 4, [ = (-i) fc pi.. pi (-D 


1-Z 


pi • • p* 


“ Pift ••• P*P» 


n 

since E P< a 1 - P from probability considerations To evaluate t*\ delete 

f-i 

row s and column t in | c 1 * | j then shift row t to the last row and column s to the 
last column. These shifts, together with the sign of the cofactor, change the 
sign of the resulting expression; hence 


1 - - 1 
Pi 


If 1 =3 .( ■!)*■* rf 1 ;• P* 

P.ft 


i-i 


. ™ Pi pi • • * P*» 


l-~ 1 
Pk 


provided s & t. Since £" is merely [ c u | after row & and column s have been 
deleted, it may be evaluated exactly as was | c ,£ |. Thus 

[ b 



162 


PATTI* a. HOBL 


Combining these results, c»i 


- for s ^ t and o„ » - + i j and therefore 
V P V' 


( 5 ) 


A 8 




By oomputing the neeessaiy derivatives of / fl , the explicit form of f, given by 
(3), can be obtained to the desired number of terms Since suoh derivatives 
contain/o as a factor, / may be written as 


( 6 ) 



Bi , Bi 



where B\ is obtained from A\ of (2) by replacing terms in the u< with the cor¬ 
responding derivative of fa and then factoring out fa . 


4 The Generating Function of x 9 Let this function be denoted by 



consequently x* iflj except for a factor of the quadratic form in f Q . Ac¬ 
cordingly, letting 9 * 1 — % 

s ,x Vo * =* 

and hence 


0(0 



• • rfZfr. 


Letting St * &iVfl and denoting the value of 2f< after this substitution by C<> 


m 


0) 


j * 1 * j ft 1 “ *1” ~ * • • • Jffei • • d%h , 


**' ! + ]■£•>■ ••• . 


since the terms involving odd powers of 1/V^ are of odd degree in the t\ and 
therefore vanish upon integration. 

For the purposes of this paper only the integral which is the coefficient of \jn 
needs to be evaluated. Since the algebra involved in this evaluation is heavy 
and the formulas beoome exceedingly long, only a few terms will be written out 
explicitly to indicate the procedure followed. 



CHI-SQUARE DISTRIBUTION FOB BMALL SAMPLES 


163 


From (1), (2)j and ( 6 ) it is clear that only fourth and sixth order derivatives 
of /o are needed Aa examples , 

g-*[*-«>ifi+£)+*(!+ 0 T. 

g -»[a - «(J+i)+« ds( 1 +i)' -1.(1+i)'], 

where Dt — “ ^afi + • + ^1 + Xt + * * +** j Following the procedure 


indicated in ( 6 ) and (7), this integral becomes 

jL f ’ / ~ 7p * + 12p * ~ 

H-’tG+sMJ+s)'] 

+ (similar terms of degree 4 and Lower in the D<) 

(8) 1 * 

+ ~ £ [p? - 6p5 + I3pi - ISp! + 4p?] 

72 rf -1 


+ (similar terms of degree 6 and lower in the £<) j dz\ • • dz* 

When 9 *■ 1, the integral reduces to that of / 0 B 5 , which in turn is the integral 
of a linear combination of derivatives of Jo But the integral of such a deriva¬ 
tive vanishes. As a result, if the integral of /o D) has been computed directly, 
that of /oZ)} and then that of /oDj can be found indirectly by equating the cor¬ 
responding bracket to zero for 0 ^ 1 . Similarly for the other terms of the above 
integral. As examples 




JoD*tfei... da* « - + 

V Vi 


JL Ij ,dua 6 + 5 )* 

Upon evaluating all suoh integrals, ( 8 ) reduces to 

§4 § “ 7 *' + 12p! - 9p!) 3^ + (j - 1) 

-f (^similar terms all containing as a factor^ 

+ i t W - flpi + 13p! - 12p! + 4pJ) 15 - lj 

' -f ^similar terms all containing ^ as a factor^ 



164 


PAUL G KOffiL 


In order to interpret these results, it is necessary to condense these various 
sums of probability expressions, If the terms are arranged in descending powers 
of the pt, it will be discovered that e&ttwn combinations condense readily. 
The condensation in each case lies m recognising combinations like 

2 ?(+ 4 £ v*Vi + 0 E p!p? 4* 12 £ p'p,pi 

f-I i< i *C/<I 


+ 24 £ ViViVtV* 


(£»)' 

However, some of the terms resulting from multiplying by 1/pi above cannot 
be condensed in this fashion until they have been reduced to familiar sums by 
using relationships of the following type: 

™ i) 2 Pit 

i<i Wt W <-1 

2 ptvmi^r + ~ + ~) m (* - 2 ) 2 pwi 

i<i<i \Pt Vi Pi/ i<1 

After all possible condensations have been made, (9) reduces to 

G-rffij-*—*] 

+k-')a[ , £i-" r+m+,, \ 

M & result, the generating function of x can be written os 

i 

o(t). r“ + - 1 (r**** - 2r l<w " + r*) 


( 10 ) 


+ + 3r »«) .j-t*) 

tl 


+ (terms involving higher powers of 1/n), 
(«? + « + !)] and ft - - 0*? + 12fc + 8)], 


6. The Distribution Function of x a It is well known that (T ih = (1 - 5W)“** 
is the generating function of what is commonly called the x* distribution func¬ 
tion. with ft degrees of freedom. If this distribution function is denoted by 
Fkbt), then the distribution function corresponding to (10) can be written as 

/ x $k + ^ (FVh - 2^m + F*) + ^ (F*44 — 3^+4 + 3F*,n — F*) 

(Jl) n n 

+ (terms involving higher powers of l/w). 



OHI-BQUARE DISTRIBUTION FOR SMALL SAMPLES 


165 


i The customary test for goodness of fit involves the integral of F h {x*) from 
X to which has been tabled for values of % and fc. The form of (11) is suoh 
that the integral of the term in \/n is easily evaluated by means of this same 
table, However, for more accurate results and for theoretical reasons, it is 
more elucidating to express these integrals in a more compact form. This is 
accomplished by using familiar" expansions for the integral of F k (x)> De¬ 
noting the integral of the explicit terms of (11) by P, it is easy to show that 

(12) P<-P l +-lfl,Si + B,S,] 1 

n 


where Pi is tho customary tabled value for k degrees of freedom and 

be*-(* + 2)1, 


B, =' 8 ’ b ‘ ,x * '-* 


(18) 


2.4 * * * (fc + 2) 






2,4- • (fc+ 4) 


[ X 4 -2(fe + 4)x , + (fe + 4Xfc + 2)], 


for k even, while for k odd both R L and Pa contain an additional factor of y/2/v 
and have 1.8 > (k -f- 2) and 1.3 • (k + 4) respectively for denominators. 


6. Conclusions, In any given problem the second approximation P can be 
calculated easily by means of either (11) or (12) and compared with the cus¬ 
tomary first approximation Pi . However, the magnitude of this correction 
term is of primary interest when x 9 is near a significance level and when one 
or moro of k, and p< is small because the acourooy of Pi is questioned in 
those cases 

For x* at the .05 level and for 2 ^ k ^ 16, it is easily shown that 0 < fii 
< .08 and - 08 < F# < 0 Clearly 8» is positive, while Si will be positive if 
one or more of the p< is sufficiently small Consequently, for thoso cases of 
particular interest, the correction term is surprisingly small partly because 
Bi and I?a arc so small and partly because they aTe of opposite sign 

To illustrate this viewpoint consider the following numenoal example Let 
71 = 10, k =* 4, x* “ 9.488, jh = ps = -sfo, pi = P* = Pb — A Then 
Si = 2.23, S 9 = 6 38, - .056, R t = - 027, Pi => 05, and P » ,045 The 

correction term of — .006 is very small in spite of the fact that this example is an 
extreme case to which the customary x* test would not be applied. 

As judged, by the second order approximation obtained m this paper, the 
actual error comitted by using the customary first approximation is muoh 
smaller than the order of the neglected terns would indicate, and therefore 
'tho range of applicability of P\ is wider than has been supposed However, 
this investigation has considered only the error due to rough approximations 
and leaves untouched the Becond type of error indioated in the introductory 
paragraph 

Ohbgon Stats College 


( Kisser and Traynsrd, loo mt, p 26L 



I 


SHORTEST AVERAGE CONFIDENCE INTERVALS 
FROM LARGE SAMPLES 

I 

Br S S Wilks 

1 Introduction, The method of fiducial argument [1, 2] in statistics has 
gamed considerable prominence within the last few years as a method of inferring 
the values of population parameters from samples "randomly drawn" from 
populations having distribution laws of known functional forms. The method 
has also been shown to be applicable [2] to the problem of inferring the values of 
statistical functions in samples from samples already observed, assuming all 
samples to be drawn from a population with a distribution law of a given func¬ 
tional form 

The main ideas of a procedure u Inch is sufficient for carrying out fiducial 
inference for certain oases of a single population parameter may be summed up 
in the following steps; 

(a) A, sample is assumed to be "randomly drawn" from a population with a 
distribution law /(fc, 0) of known functional form 

(b) A function »ft, ■ •, , fl) of the sample values Xi , a*, • , x n and 

parameter 0 is devised, which is u monotome function of 0 for a given 
sample, so that the sampling distribution Qffldt of ft) 

« ft, say, in samples from the population with 0 - ft is independent of 
ft and the except as they enter into 

(o) For a given probability a a pair of numbers and ^ is chosen so that 
when 0 = ft, the probability that < ft < ft! is 1 - a, or more, briefly, 

(1) Pty*,, < ft < | 9 = ft) = 1 - a 

which can be stated in the alternative form 

(2) P(0<ft<0)0 = ft) = l-«. 

(d) 0 and 0 being functions of y/*, ^ and the sample, are subject to sampling 
fluctuations nnd it can be stated that tlie probability is 1 - a that they 
will include the true value of ft whatever it may be, that is, ft, between 
them. The statement holds for all values which ft may take on. . 

The number 6 and 3 are known as fiducial or confidence hmtts [3] of ft ana 
(ft 3) a confidence mkml for the confidence coefficient 1 - a. We therefore 
have the following rule for making inferences about the unknown number ft 
once $ !ms been found: For a given sample solve the equations 

t ifa, i fl'n , ft) = | 3Cs , 1 i Xn , ft) * 

166 



SHORTEST AVERAGE CONFIDENCE INTERVALS 


167 


lor 0 o Let 0 and @ be the two values of 0 o formally obtained, The statement 
that 8 and 8 will molude the value of 0 in the population actually sampled, if 
consistently made in each of an aggregate of cases involving populations having 
distributions of the same functional form/(as, 0) will he correct (theoretically) 
in 100(1 — a) per cent of the coses. 

If ^ is a function of statistics and t% of two samples from a population of 
known functional form, which is monotomo in each t for given v&lueB of the 
other, then one oan argue flducially about values of one i from values of the 
other one. 

For a finite value of ft and discrete distributions /(as, 0), it is not possible to 
carry through Bteps (b), (o), (d) as they are now stated. However, under 
certain conditions, it is possible to carry out a procedure for the discrete ease 
which will allow one to say 

( 3 ) P(0 < 0 O < 8 1 8 m 0 O ) > 1 - o. 


ijf functions which have the property that their sampling distributions ore 
independent of 8 and the x*a for a given distribution f(x, 6 ) are not, in general, 
unique, The question then arises as to how (if possible) one can choose ^ 
functions and limits i/4 and so as to get confidence intervals for a given a, 
which are shortest or *''best” in some sense. Neyman [4] has investigated the 
problem of obtaining "best" confidence intervals for the case of small samples 
The object of this paper is to oonsider the problem for large samples Under 
fairly general conditions it will be seen that a rather simple asymptotic solution 
exists for the largo-sample case, which is connected in an essential manner with 


2 . An asymptotic distribution. Suppose a population n has a distribution 
function /($, 0), where $ is a random variable and 0 a parameter Actually, 
f(xj 8 ) may involve several other parameters whose values may be regarded 
as fixed throughout the paper. The problem of arguing fiduoially about several 
parameters simultaneously will not be considered in this paper. In order to 
include the case of a discrete as well as a continuous variate x, we shall consider 
the cumulative distribution funotion (o d.f.) F(x } 0), which is monotonic and is 
such that 

F(-00, 0) « 0 , JT(+ 0) - 1, F{$ + c, 0) > F{x, 0) 

F{z + 0 , 0) = F(x, 0), for c > 0 , and a < 0 < b. 

Thus, Fix', 0) P($ < x* 10). In the case of a continuous variate a?, where 

f(x, 0 ) is a probability density funotion, then dF(x, 0 ) — /(a, 6 ) d»; in the discrete 
case 0) = /(«, 0) where /(x, 0) is the probability that variate x takes the 
value indicated. We shall be interested in continuous funotions <p(x) for which 
the Integral J 9(c) dF(x, 0 ) taken in the Stiolj tes sense exists, Limits on integral 
signs are understood to be - <*> and ». 



m 


ft ft WILKS 


Now consider a sample 0 n of n individuals independently drawn from H B , 
the population for which the c.d.f is Ffa fl»). Let the values of x in the sample 
be a?i,, *, u . The probability clement associated with the sample is 

( 4 ) dP. 

Let L = log dP„ Then assuming that 4 log d F(pt t 6) ® g(a } 0), say* exists for 

09 

0 = 0o, and for each *, (except for a set of probability 0), we have 


(6) ¥~S e< - X " 

In all ordinary problems in statistics p(x, 9 ) redness to j}(*, 9 ) where 

f(x t ff) is probability in the case where x is a discrete random variable and 
probability density in the case of a continuous random variable, Lot ffo denote 

( arV **r 

—) denote ^ with 0 = 0 B . Let 


w 


Al 


Ed l(ptt) a l 


J ffo dF(x, 0 o), 


Eofo) will be used to denote the mathematical expectation of 9) in samples from 
n D , 1 e. when the population distribution is dF(%, 0 Q ), We shall consider the 
sampling theory of 

(*\ 

( 7 ) fa * \ 80/0 

Vn Ao 

in large samples, from IIo 

Let pD°(t) be the characteristic function of fa for samples from Ho} it is 
defined by Xfy**). Then 

(8) 

" H 1 i ,i( * + *°]} 

where fa and fa are real functions of t, x, n and 0 O , such that if | Safa)* | < K < *>, 
(1 e. the third moment of g 0 is finite when 9 — (k), for a < &a < h, then EcfoW 
and EofoSfe] am uniformly bounded for some f-intervol S (which includes i » 0 
as an interior point) for n larger than some ft® and for 0o on any fixed subtoterval 
of the interval (a, fr) Suppose that F(x, 0) is each that 

(«) f liPC*.«) - | J itk, *) - 0, «<d<b 

This condition implies that the range of t be independent of 0 



SHORTEST AVERAGE CONFIDENCE INTERVALS 


169 


If tt is allowed to increase indefinitely, then we have at once that tends to 

e - ** 1 uniformly in the interval L We now make ubq of a theorem [5] which 
states that if an unlimited sequence of random variables x (1} , a®, ., ,., 

with odJVs F il \x), F lt> (x), •* , F M (x) • . have corresponding characteristic 
functions w <a> (0 • * <P in) CO • • then a necessary and sufficient condition 
for F^'(z) to converge uniformly to a c.d.f. F(x) at each point of continuity of 
F($) on the interval (- ») is that the sequence of characteristic functions 

converge uniformly to a function Pi(i) on an interval 1 1 1 < e for some e > 0 
The oharacteristic funotion <p(t) associated with F{x) will then be identical with 
<pi(i) and the sequence (£) . • converges to <p(i) uniformly in every 

finite ^interval. 

From this theorem it follows at once that, since e~* 4< is the characteristic 
function of a variate distributed normally with mean 0 and variance 1, the 
asymptotic c.d.f of to for large samples is given by 

g°) 

We may conveniently summarize the foregoing results in the following 

Theorem 1. Let Xi , a* r • • • ,x n be the values of xm a sample of independ¬ 
ent drawn individuals from a population Ho which hasac df. F(x, 0 o), such that 
for a < 0 5 < b, 
d 

(a) — dF(x } 0 ) exists for aU x'& except possibly for a set of probability 0; 
dd 

(b) JSo[(p!)) is finite; for n > no, 

(e) condition ( 9 ) is satisfied. 

Then (he asymptotic c.df. of fa for large samples defined tn ( 7 ) t« given by ( 10 ). 

The statistical significance of this Theorem is that if we know the functional 
form f{x, 0) (for which the first derivative/'(as, 0) with respect to 6 exists) of the 
distribution funotion of a population H and if the sample xi , x* , *» *, x n is 
“randomly drawn” from Ho, then the quantity 



is a random variable which is approximately normally distributed with mean 0 
and variance 1 in repeated large samples. It will be noticed that the quantity 
in the numerator of (11), is simply the derivative with respect to 0, at 0 = 0 o , 
of the logarithm of the likelihood of 6 for the given sample to is a function of 
the sample 0* and the truB value 0« of the parameter 9 , and the thing that makes 
to a random variable is the random nature of the sample; ft) is a fixed but un¬ 
known number. Thus, for example, when 1 - « - 96 in (1) and knowing 
that we have "randomly drawn” a large sample 0 n from a population flo with 



170 


0. 0. WILXB 


distribution/( it, fa) of known functional form, we can say that the probability ia 
.95 that the sample will produce a value of fa in the interval -1 96 to +1.90 
that is, 

(12) P(-l,96 < fa < 1.961 f « «) « .95 

This statement holds, whatever may be the value of the unknown 4 Now, 
the inequality -1.96 < fa < 1.98 is equivalent to the inequality, $ < 4 < 6 
became of the monotomo nature of fa aa a function of fa • Hence (12) is equiva¬ 
lent to 

(13) P(6 < 4 < 9 1 8 m d 0 ) - .96 

where 0 and 0 are obtained by solving fa =* ± 1,96 for fa . The fiducial limits 
0 and d will thus he functions of the sample and will be subject to sampling 
variations. In general, of course, one could choose any probability level 1 - 
and find 4> a so that 

1 (14) Pi-fa < fa < fa I * 4) ■ 1 - «, 

from which fiducial limits for 4 can be found bs before. 

The extension of Theorem 1 to the case in which the distribution function of 
the population It involves several parameters 4, fa , • • Oh having values m 
some region 22 of the space of 0’s, is immediate. Ho in this cose would be specified 
by the values 0w, 6», • • • 0u . In fact, we can state the situation as 
Theorem 1': Let F(x, 4,4, ■ >, $&) denote the c.df, of x and (allowing t ,k 
to take on values 1,2, •. t h)ht 

1 /dL\ 

fa “ WfietC L * 2logdF(Xt f 6lO, 040, « , 0*o), 

^tlogdP(a,0i, • - , 0JJ, 

Atf - BAgdlfol where g a = g { with &{ « 0 M . 

If, in R, 

0 

(a) t- dF(x, 0i 1 • •, fa) exists for all x's except possibly for a set of probability 0 ; 
oB{ 

(b) E<i(g<ogflgui) are aU finite; 

(c) J dF(x, flii 0 b, * * * 1 4) = ^ — dP(®, 0i, 0s, • * •, 0i) = 0 

(d) 11 ^ |1 w non-singular \ 

then the asymptotic distribution of the fa in large samples from IU ts a normal 
mwtouorwrfe distribution uitth matrix || An \\ of variances and covariances t an d zero 
means. 

A similar theorem holds for the case in which n is a multivariate population in 
addition to having several parameters. 

The question now arises: In what sense is the confidence interval between 0 and 



shortest average confidence intbuvalb 171 

0 os determined from "best”? It will be shown that the average rate of 
change of fat with respeot to 6 at 0 = fa is greater than that for a rather broad 
class of functions of the fa type, that is functions of the observations and 0 which 
are asymptotically normally distributed Since we are dealing with large 
samples, we are only interested in values of 0 in the neighborhood of , for 
which ^ as a function of 0 is approximately linear, and demonstrating the prop¬ 
erty just stated regarding the average rate of change of fa with respect to & at 
0 - 0o is equivalent to showing that the two 'Values' 1 of 0* for which fan — ±fa a , 
will, on the average be closer together than those computed from any other fa 
function than fa of the class of functions to bo considered. This class of func¬ 
tions will be designated as belonging to class C, and will now be more accurately 
defined. 


3 Functions of class C and their asymptotic distributions. Following an 
argument similar to that used in proving Theorem 1, we can readily prove 
Theorem 2: Let h(x, $) be a function in which x has the o if. F(x, 0), and 


(16) (a) JSo[ h[x, 0o)] - 0; 

(b) Eo[(h(*, 0o)l # ] M Jlmtc t for n > no. 

Let 

(10) A* *= Foljh(a?, 0o)} s l 

and for a sample of values x it x§, ■ • • , x n 14 

(17) i ft - ? h{ * h fe) 

. VnAf 

Then the asymptotic c.df. of for large samples /row Ho is given by 

We shall designat e as belonging to class C any function fa* made up according 
to the rule expressed by (17)i of functions h{x f 0) satisfying (a) and (b) in (lfi) 
and such that fat is asymptotically normally distributed with zero mean and 
unit number, Clearly, fa as defined by (7) belongs to class G, 


4, Comparison of average confidence Intervals computed from fa and fat , 
We shall now Bhow that for each fixed value 8 a of 0 the average rate of change of 
fa with respect to 0 is greater than that of fa* for any h(x t 9) which is not a con¬ 
stant multiple of g(x, 0 ). Consider ~ and for a given n, We have, 


(18) 

( 10 ) 


dfa __ 1 f y dgfejf 0) 

89 ** VnA hi 89 
dfa* 1 (y 8h(x,, 0) 
"00 " VnA* [ i 60 





172 


& & W1LK0 


Now 


/^(n, fl)} » ««*<*> - [?(*,, «)]* 

88 88 l 88 ' J </?(*(,«) 


Assuming that 

( 20 ) 


/p'tf’te «)-|i/<«%». 5 ) -» 


and remembering that ^[g(xt, ft)l = 0, we have 


( 21 ) 

and 

( 22 ) 

New, since 
(23) 


lk{ 3, 0) dFfo 0) =* 0 


and assuming that (23) can bo differentiated under the integral sign, we have 

For the difference A? — Ai in samples from populations with 0 = 8 %, we have 


(26) 


EBf {/(^iwr) **"■[<*•'#*<*• 


r 


a \"H\ 


■ *)/J 


C (ad v 

- U «*, 9) VdF(x, 9)) 

Making use of Schwartz 1 inequality which stated that 

j fl 8 (as) dx * J h? (x) dx > J ~ J fl(x) /i(r) dxj\ 

where the equality sign bolds only if g(%) & K h(x) f IC being a constant, it is 
evident that independently of n> A? > A?, and furthermore, the only condition 
under which A? *= A* is that 


S <*<*<« 


h(x,9)VdF(*,9) 



SHORTEST AVERAGE CONFIDENCE INTERVALS 


173 


that is, 

(20) h(x, d) =3 K g(x, 8), 


Therefore we have 

Theorem 3* If g[x, 0) and h[x, 6) satisfy the conditions of Theorems l and 2 
respectively and furthermore, if (20) is satisfied and if the expression on the left in 
(23) can he differentiated under the integral sign with respect to 6, then the average 
rate of change of $ with respect to 6 for each fixed value 0 O of Q is greater than that of 
(for which h(x, 6) Kg(x, e) with resped to 6, when 8 = d 0 in samples from n 0 . 

This Theorem Bimply means that when computed from f 0 the fiducial limits 
for the true but unknown value 6 a of the parameter 0, whatever value 0 O may 
have on the interval a < 0 O < h of posable values, are (for large samples) closer 
together oii the average than those computed from any other \pt of class C, 
There 1b no function which is more efficient, os it were, for determining con¬ 
fidence intervals for 0<> than the particular given by (7) which is with 

9 

A(a?, 0) replaced by g(x, 8), that ie, — log dF(x, 6) t The actual manner in which 


the fiducial limits for 0o ore found for a given confidence coefficient 1 — a, is 


to set 


(27) 


Wo 




and solve formally, for 0b, where \fr a is the value for which 


2 f* 

V2irJ^ 


c 


dx = 


which can be found from normal probability tables. The two values of 0& thus 
found are the fiducial limits 8 and 6 for the true value 0o and we can state that 
the probability is 1 — a that 8 and 0 will include the true value 0o between them. 
This statement is valid whatever may be the value of 0o between a and 6. This 
rule consistently followed for large samples, will produce fiducial limits 0 and 0 
which are oloseBt together on the average, lor each fixed value of the probability 
level ’a between 0 and 1. It should be observed that no assumptions have been 
made regarding the existence of sufficient statistics. 


5. Examples. Example 1. Suppose a large sample of n individuals to be 
drawn from a population known to have the Poisson distribution law 


f(x, m) = 


mV m 

si 




-f (S xt) log m - nm 


/ 8L\ _ Xxj 
\9m/o mo 


f/a log A 1 

’1 = £I 0 [YiL - l\ 

n —. 

A 8m /< 

ij L\«»o / 

J mo 


We have 



174 


B. a, WILES 


The fiducial limits m and m for a ^ .06, that is, the 96 per cent fiducial limits, 
are found by formally solving the equations 



for 7Tk. 


The fiducial limits are found to be 



Example 2. Consider a large sample of n individuals known to be from a 
binomial population having the two olasses A and B. Let p denote the proba¬ 
bility of an individual's belonging to and q » Z - p that of belonging to B. 
Let x denote the number of individuals belonging to A in one drawing from the 
population; st will take on only two possible values, 1 and 0, with probabilities 
p and q respectively. The population distribution is thus 

f{x, p) = p*(l - p) w . 

We have 


L - log p 2(1 - X() log (1 — p) 

/oL\ ^ m _ n - m ^ m npo 
\apA = po l - Po 1=1 p*(i - pb) 

where m is the number of individuals belonging to A in the sample, Further¬ 
more 

96 per cent fiducial limits for po are got by solving the following equation for po 

V^VpaCl - po) & ± ' 

It will be seen that situations, such as frequently occur in genetios, where p 
may be a funotion of some other parameter 0, say p => w(0), can be handled by 
simply replacing po by u(4>) and solving for 0o * 

Example 3. Let the form of the distribution function be where 
0 < x '< «. For a sample of individuals, 

L « n log 0 — 0Za< 





8 H 0 KT&T omul COHVIDBNCn INTERVALS 


176 


The 96 par cent fidudol limits 0 and I are given by solving (he equations 


“-S& 

h m 


±1.90 


for Sb> We get 


0 - 


1 - 1.96/Vft, 


I 


1 + l.flfl/Vn 

2 


where $ is the mean of the sample. 


PRINCETON Univhusity, 


[1] R, A, Fibhir, #t The Conoepts of Inverse Probability and Fiducial Probability Referring 
to Unknown Parameters, 1 ’ Proc.. Jtoyal toofety of Mon, Series A, vol. 130 
(1033), pp, 343-348, 

[2| R, A, Fran, "The Fiducial Argument In Statistical Inference," imU of EtipenicJ, 
vol. 6 (1035), pp, 301-308, 

(3| l Nbyman, "On the Two Different Aapeote of the Representative method: the Method 
of Stratified Sampling and the Method of Purposive Selection," Rogaj Mstiea! 
Society, vol, 07,1084, pp, 558-525, 

14] J. Hitman, "Outline oi a Theory of Statistical Estimation Based on the Classical 
Theory of Probability," Phil Trm, Roy, Soc, London, Series A, vol, 280 (1037), 
pp, 83M8G, - 

[5] H. ChamAr, Random Variables and Probability Dielribuiiotu, Cambridge Tracts in 
Mathemntios and Matbematioal Physios, No, 86, Cambridge University Prase, 
1937, 



TRANSFORMATIONS OF THE PEARSON TYPE III DISTRIBUTION 

By A. C. Olshen 
I. Introductory 

Transformations of the normal curve have been used as a basis for the repre¬ 
sentation of skew frequency distributions by Edgeworth, Kaptcyn, Van liven, 
Bernstein, and others. Various studies have been made of the distributions 
obtained by replacing each of a set of normally distributed variates by a loga¬ 
rithmic function of the variates, Among the earlier investigators along this 
line were Galton, and McAllister; later, works by Jorgensen, FiBher, Wicksell, 
Davies, and a more recent study by Pae-Tsi-Yuan, were added, 

Riefcz 1 restated and fronted, in a general fashion, the question as to the proper¬ 
ties of the distribution of powers of a set of variates which are known to be 
normally distributed. By a suitable choice for the origin of the normal curve, 
he obtained results winch are applicable in answering questions which frequently 
arise in the applied Held concerning the properties of families of interrelated 
distributions, one strain of which is known to be normally distributed.' For 
example, in the family made up of the diameters, surface areas, volumes, etc. of 
some physical quantity, if it were known that one set, the surface areas for 
instance, were distributed normally, then from his results wo have the properties 
of tire distributions of any of the other sets. 

Likewise it has seemed of interest to investigate, in a similar fashion, the 
properties of the transformed Type HI Pearson distribution. We shall treat 
both the power and logarithmio transformations. For instance, if we knew 
that any one of the physical measurements, velocity, kinetic energy, momentum, 
or centrifugal force (all of which are functions of the velocity) were distributed 
according to a Typo III curve, then we raise the question os to the properties 
of the distributions of any of the others. Similarly, if tho intensity of certain 
light, I, were known to be distributed according to a Type III law, we will 
discuss the properties of the distribution of the brightness, B, of the light as 
seen by the eye, since the two are known to be related by tho law B ** K log I. 
The same analysis applies to the relationship between L t the loudness of a 
sound, and E, the energy in the sound wave, since L - Klog K 
Two forms of tho Type III distribution will be considered. In the first form, 
all the variates are taken positive; in the seoond form, tho origin is at the mean 
and the variates are measured in units of standard deviation. 

1 H. L. Riots, Frequency Distributions Obtained By Certain Transformations of Nor¬ 
mally Distributed Variates, Annals of Math., Vol, % (1922) pp. 201-300, 



TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 


177 


In the last section, a transformation is developed which will transform, the 

ordinates of a given probability function into the ordinates of the normal curve. 

-«* 1 

■ u 

y s Cb s , to within certain approximations. This transformation is applied 
to the Type III distribution and to the distribution obtained under power 
transformations of variates of the Typo III distribution. 

II. Power Transformations 

a. Type III cum with all variates positive. 

Given the Pearson Type III law, 

t 

1 (1) V - 0 <, x < », 

where 


(2) ir 


2pg 


>0, 


ye - 


i-*;>!, 

T 


JCmP. — £ - > 

y 


m T {y$Y 

The probability function (1) is a single-valued, real-valued, non-negative, 
continuous function of x with /%*-1. The probability that a variate 
ohosen at random will fall into the interval «i to at is given by 


(3) 


J* i 


ydx> 


Let us make a transformation by replacing each variate x by x\ where x' = x n , 
and n is a real number on which restrictions will be placed ns we proceed. When 
n is such that x' may have more than one value corresponding to an assigned 
value of * wc shall consider only the principal value of a/, Then da/ - nx n " ] dx, 

dx* 

except at x =* 0 when n < 1, and dx = —— v except at stf = 0 when n > 1, or 

nx'-*' 

n <0. 

The frequency function of the x' variates is given by 


(4) 


y* 


n 


which does not represent a Type III curve when n ^ 1. The function (4) is 
discontinuous at®' = Oif — < 1. Likewise, corresponding to (3) we have 


( 6 ) 


n J B f 


* Tha expression Smo. represents the mode, and Said, represents the median. 



A. C. OWJHEN 


m 


( 6 ) 


In order to study the maxima and minima points of (4) we take the derivative 

- /faOln®')’’ 1 j-*')'*'" -f 7$ - nj, 


dx' 


The derivative changes signs at 
(?) *' 




Thus, variates in an interval, dx?, at the mode of the new distribution (4) come 

by transformation, from an interval in the neighborhood of % ** £ — which is 

to the left of the mode of (1) when n > 1. The funotion (4) will be a monotone 
decreasing or a unimodal continuous distribution with mode given by (7) 

according as £ is equal to or is greater than -. 

It will prove convenient to discuss the properties of (4) under three headings, 
according as it > 1,0 < » < 1, and n < 0, where n or its reciprocal is an integer. 
Cwel. n > 1. 

71 

When £ < -, (4) is a monotone decreasing funotion, infinite at the origin and 

asymptotic to the $' axis; in this case the distribution of x' is similar to the 
distribution arising in the corresponding transformation of a set of normally 
distributed 1 variates, when £ <, 4(« -1), where £ is the arithmetic mean of the 
g’s of the normal curve. However, we are primarily interested in the case 

when £ £ under which condition a mode exists on the frequency curve /(*') 

and is given by aw «= - 5^ . Henceforth in discussing the comparative 

values of the measures of central tendency, it will be assumed that the condition 

flit 

£ > - is satisfied. We have, 

7 

(* - y) < (* - ^) ’ wtwe *“■" 1 - 

Thus, while variates at the modal value of % in the Type III distribution trans¬ 
form into x* ® ^£ — , the mode of the new distribution is at m ^£ — 

which, when « > 1, is to the left of the positions to which variates at the mode 
. of the Type III distribution were transformed. Furthermore, as n increases, 
approaches the origin, 


1 Of. Rietz, loo. oil p, 286. 



TRANSFORMATIONS OP TIB ARSON TYPE III DWTRIBOTION 


179 


The arithmetic mean of the a?°s distributed in accord with the funotion (4) is 
given by 


It Jo 


__ T(y& + 71 ) 

't"T( yEp 

Similarly the ath moment about the origin is 

/n\ * rfaa + aw) (y£ + sn — l) (lwi 

* * 7* n r(y2) y ,n 

But, 

which is greater than (#) n , hence £' > t n . Thus, while variates at the mean 

value of x in (1) transform into (£)", the mean of (4) is at which is to 

the right of the positions to which variates at the mean of (1) were transformed. 
We have 


(■-;)' 


< (*)’ < 


rM + n) 

7»r(re) ’ 


hence 


3qAA. ^ ♦ 


In 1896, Karl Pearson 4 showed that the median of the Type III ourve was 
approximately two-thirds of the distance from the mode to the mean, end later 
Doodson* gave similar results. The analysis of (1) along this line is given in 

Section IV. However, since * aoi ■ * — we may take = £ —- where 

7 oy 


o> 1, (approximately equal to 3). Then aid. « (s - . 

(12) (*_?)'<(*- I)" <(*)•, 


We have 


hence from (10) 
(13) 


$mL *£ ® . 


4 Karl Pearson, Skew Variation in Homogeneous Material, Philosophical Transactions, 
Vol, IMA, part 1, (1805) pp. 348-114. 

1 Arthur T. Doodson, Relation of Mode, Median and Mean in Frequency Curves, Bio- 
metrlka, Vol, 11, (1017) p. 425. 



180 


A. C, OLBHEN 


Considering the cose when n *• 2, we have with the aid of (9), 

6 ( 67 * x* 4 IT 7 S 4 15) 8 
"* l ~ (7*)(7* 4 1)(W + 3)» 
and 

a 3(4*V + 72yY + 337yY 4 020 yS 4 * 420) 

** _ (7*)(T* + 1>(2t® + »)* 


From the moments of ( 1 ), ono readily gets 


and 


fl 4 

3(r> + 2) 
7# 


It can be shown easily that »»& > and > Jfa ; hence the distribution of 
the squares of variates is more leptokurtic and more skew than the original 
distribution* 

From ( 10 ) and ( 12 ) it is evident that the mode approaches neither the median 
nor the mean as n increases, subject to the condition y% > n. Each of the 
ratios of the mode to the median and to the mean approaches the limit 1 os sf is 
increased indefinitely, the rapidity of approach to the limiting value depending 
on the size of n. 

Taking the second derivative to find the points of inflection of the function 
(4), we have 

= /(sc'HrtBT* |y 4 yx’* ( 8 » - 2yx - 1) 4- ( 7 Y — Sn- 7 ® 4 2 ft a )J‘. 

When the points of inflection exist thoy are given by 
( 14 ) e (^7^ — 3ft 4 1) drVw^— 8 n 4 1 4 4yj 

2y 

Undor the restriction that 7 $ £ ft, the expression under the radical in (14) 
cannot vanish, and will always be positive. 

Case II. 0 < » < 1 . 

We now consider the distribution obtained by taking positive integral roots 
of a set of variates distributed in accord with ( 1 ). The mode of /($'), as given by 


( 7 ), will always exist since from ( 2 ), 7 a! > 1 > n. 

We have 

, ("!) 

■<(1 

»- i) 

w 

if n > 

(15) (•-!) 

•>( 

5-i.y 

«v/ 

HB<i. 

c 


n , »/ 
\ = 1 

,_IV 

if ft » i. 
c 

\ y) 

[;S 

ay) 



TRANSFORMATIONS OP PEARSON TYPE 111 DISTRIBUTION 


181 


Hence it is evident that ai 0 . is loss than, greater than, or equal to arid, according 
ns n is greater than, less than, or equal to ljc. In power transformations of 
symmetrical distributions 6 the results differ in that the modal value is always 
greater or less than the median according as the value of n lies between 0 and 1 
or outside of these bounds. 

Here, , hence in contrast to Case I, the mode of the new 

distribution is to the right of the position to which variates at the mode of the 
Typo III distribution were transformed. 

It has been shown 7 for every set of positive values that fi > ($)" when n lies 
outside of the interval 0 to 1, and that p, < (£}" when n lies between 0 and 1. 
We have then 


(18) 


r(yE + tt) 

( 7 )«r(7«) 


*'A«J *- ft *■ < (2)"- 


The mean of the new distribution is to the left of the position to which the 
variates at the mean of the Type III distribution were transformed. 

In Section IV, with the aid of certain approximating assumptions, it will be 
shown that S' > aid. when ai d . > x m *. and conversely, < aid. when aid. < 
Case III. n < 0. 

Let n =s —m, where m is a positive integer. Then we have 
(17) fix’) - v - fix') - 0 at tt - 0. 

Wt 

I 

In place of (6) we have 


(lg) =* f(x') (mx' » ^ |(v£ + m) as'* - vj, 

and (7) becomes 



* H. L. Rietz, On Certain Properties of Frequency Distributions, Froc. National Acad¬ 
emy of Science, Vol. 18, (1037) p. 820. 

1 J. L. W. V. Jensen, On Convex Functions and the Inequalities between the Meane, 
Aota Mathematica, Vol. 30, pp, 178-103. 



182 


a . c. oimm 


Hence, as in Case I, the mode of (17) is to the left of the position to which 
variates at the mode of (1) were transformed while the mean of (17) is to the 
right of the position to which the variates at the mean of (1) were transformed. 
Since 



we have xL&. < aw,. Also 


("")(*-i*? 1 ) 



hence S' > aid, • Therefore 


( 20 ) 


®nwi ^ a7|nd, ^ $ * 


As a special case, when n ** — 1, (17) redueea to 

(21) /W - 


whioh is a Pearson Type Y distribution* 
b. Type 111 curve with mem two and unit variance. 

Even though the form of the Type III distribution with which wo have been 
dealing, wherein all • the variates are positive, is more closely akin to actual 
distributions that may arise in applied problems, nevertheless it will bo of interest 
to examine the properties of the transformed curve when the mean is token as 
zero, with unit variance. 

The second and third moments about the mean of the distribution (1) arc 
If we write a» for the third standard moment then 


( 22 ) 


*“ 3 - 


By replacing the variable, x, in (1) by the expression 
(23) + 


we obtain the Typo III distribution 




TRANSFORMATIONS OP PEARSON TYPE III DI0TBIDUTION 


1S3 


Equation (22) lends itself to a simple interpretation of the restriction made in 
Section Ila, Case I, that 7 $ > nin order that the mode of (4) exist and be given 
by (7). The upper bound in the values of a* considered by Sftlvosa 8 in the 
computation of his tables was os «= 1 . 1 . Upon examination of the tables it is 
obvious that in most oases the skewness of the Type III distribution, as measured 
by o!j, will be less than 1.1. Hence in most cases we will have 3.22 < y& < «. 
Hie effect of the limitations imposed by the condition y£>n may be inferred to 
some extent from the following table, 


TABLE I 

The upper hound of as for the existence of a mode in Case 1 , Section Ila 


2 

3 


6 

0' 

7 

8 

0 

10 

25 

50 

100 

1.41 

1.15 

1.00 

.89 

.82 

i .70 

.71 

.67 

.63 

.40 

,28 

.20 


When wo make a transformation by replacing each variate i in (24) by (' 
where t‘ **t n (ns* 0 ) and n is an integer (positive or negative) or the reoiprooal 

of an integer, then di' * nt * _1 dt, except att = 0 when n < 1, and dt s= , ex- 

nt'~* 


cept at £' 

(26) 


0 when n > 1, or n < 0 . The function (24) becomes 


m - ~ 

n 


6+J*5 


iVf 1 -JLi* 


e ** 




n—1 

t' " 


The distribution function, /(£'), is infinite at t' = 0 when n > 1 . In place of 
( 6 ), we have 



Here <ii and <h are taken to be positive or zero when n is even. When n is odd ai 
and Q» may be taken negative as (25) will give the frequency curve for negative 
values of t* that arise from setting f = t n when t is negative. Examining for 
maxima and minima points, we have 

m . _/(<0 |nt(l + p + -j- + (n - l)j. 


■ Luis R, Salvoes, Tables of Pearson's Typo III Funation, Annals of Mathematical 
Statbtlos, Voi. 1, (1030) pp. 191-106. 



184 


C. 0L3KEK 


The derivative changes signs at 


When n > 1, 
When n is odd 

and positive, the derivative changes signs at t' = 0. If n is tho reciprocal of an 
odd positive integer greater than one, there is a minimum at V =» 0, and the 
function fif) is ssera at this point. ITurther properties of the frequency curves 
given by (25) will be discussed under tho three cases treated in Section Ila. 

Casa I. n > 1. 

When > 4(n — 1), it can be shown that (28) gives neither a maximum 

nor minimum point of j{l') since t'" will always be less than U . fl Similarly, 

when ^ < 4(» — 1), there is neither a maximum nor minimum since (28) 

is imaginary. When n is odd, fit’) is infinite at the origin and is a monotone 
increasing function of t' from tho lower bound to the origin, and a monotone 
decreasing function of tf from the origin. When n is even, f(t') is a monotone 
decreasing function of t\ infinite at the origin. The forms of tho distributions 
in this case arc similar to those arising in power transformations of normally 
distributed variates 10 when n> 1 and S ? < 4(« - 1) and also to the forms arising 

4l 

in Section Ila, Case I, when $ < -. 

7 

Even though we have a discontinuity at tho origin, the total area under the 
curve is one, which is evident since wo can integrate function (26) over the entire 
range of t' when n is odd and positive. 

CmlL 0 <n<l. 

This case includes tho distribution obtained by taking positive integral roots 
of a set of variates. As in the study of the normal distribution, 11 wc limit our 
considerations to the principal real values of the functions. When n is odd, there 
is a minimum aW' = Q and a maximum given by each of the two signs before 
the radical in (28), Hence in this case, wc have one minimum and two maxima. 

With the values for u and «j in (24), tad. » —.164 and ta«. **= —-600. The 
transformed distribution gives Ca. * - .647 and two modes, tho primary mode 
Co. ■» —.067, and the secondary mode » .90S. In. contrast to the cor¬ 
responding transformation of normally distributed variates, the primary mode 
is less than the median. 


os ' - Kt* 4 vW-^f 

when > 4(» - 1), and at £' =* 0 for cortnin values of n. 

> 4(n - l) only for very large values of n since as < 2. 


‘Tho expression i f represents tho Lower bound of i in.distribution (24). 
u H. h, RieU, Of. loc. cit. p. 29C. 

* ©.Biots, lots, olfc. p, 297. 



TRANSFORMATIONS OF FEA&80N TYPE III DISTRIBUTION 


185 


TABLE II 


Comparison of the Type 111 Distribution and the Transformed Distribution when 

as k n « S 


t 

n 

V 

A<0 

—3.00 

.000001 

-27,000000 

0 

-2.60 

.001347 

-16.625000 

.000072 

— 2.00 

.029467 

- 8.000000 

.002456 

-1.75 

.072787 

- 5,359375 

.007922 

-1.60 , 

.136285 

- 3.375000 

.020686 

-1.25 

.220462 

- 1.953125 

.047032 

- 1.00 

.301350 

- 1.000000 

.100460 

- .60 

,405345 

- .125000 

.540460 

- .26 

.414211 

- .015626 

2.209126 

- .10 

.409131 

- .001000 

13.537700 

- ,05 

.401435 

- .000126 

63.531333 

- .02 

.308272 

- .000008 

331.803333 

0 

.395962 

0 

00 

.02 

.393522 

.000008 | 

327.935000 

.05 

.389628 

.000125 

51.650400 

.10 

.382549 

' .001000 

12.761633 

.15 

.374796 | 

.003375 

5.652619 

.25 

.357633 , 

.015625 

1.906843 

.50 

.307293 

.125000 

.409724 

.75 

. .252971 

.421875 

.140909 

1.00 

.200493 

1.000000 

.066831 

1.60 

.114233 

3.375000 

.016923 

2.00 

.058376 

8.000000 

.004865 

2.60 

.027286 

15.625000 

.001466 ’ 

3.00 

.011836 

27.000000 

.000438 

3.60 

.004820 

42.876000 

.000131 

4.00 

.001859 

64.000000 

.000039 

5.00 

.000242 

126.000000 

.000003 

6.00 

.000027 

216.000000 

0 


CasellL n < 0 . 

Let 7 i m —m, where in is a positive integer. Then (25) beoomes 






186 


A. 0 , OLSHfiX 


TABLE 111 

Comparison of the Type III Distribution and the Transformed Distribution When 


c*a =* J, n » l/d 


-—.—*-■ ■ ■ - 

l 

--“ 

n 

l’ 

m 

-2.00 

0 

-1.250921 

0 

—1.75 

.026272 

-1.205071 

,11010 

-1.80 

.122026 

-1.144714 

.48200 

-1.26 

.261021 

-1.077217 

.87386 

-1.00 

.360394 

-1.000000 

1.08208 

- .90 ■ 

.893277 

- .906480 

1,00998 

l 

- .76 

.427326' 

- .908500 

1.06874 

- .50 

.448084 

- .703701 

.84683 

- .27 

.433988 

- .046330 

, .54386 

- >08 

.406078 

- .480887 

.22606 

0 

.300734 

0 

i 0 

.08 

.374636 

.430887 

.20801 

.27 

.332027 

.646330 

.41723 

.60 

.280748 

.793701 

.53058 

.64 

.240865 

.801774 

.55069 

.74 

.228711 

.904604 

.56134 

.00 

.106004 

,965480 

.65064 

1,00 

-.178470 

1.000000 

.63641 

1.60 

.104259 

1.144714 

.40685 

2,00 

.067262 

1.269921 

.27236 

2.60 

.029089 

1.357200 

.16572 

3.00 

.016133 

1,442250 

.00443 

3.60 

.007410 

1.618205 

.05125 

4.00 

,003639 

1.687401 

.02375 

4.60 

.001655 

1.050064 

.01353 

6.00 

.000701 

1.700970 

.00368 

6.60 

.000344 

1.706174 

.00322 


Taking the derivative, 

(80) A M = -m {^(1 + f «'“*)}'{(»» + !)«'- + - lj, 


and in place of (28) we have 


(81). 


l = f 5 *l/( ! r)'+‘ < -+») 



2(?n+i) 


The transformed distribution has little statistical significance for odd values 
of m, since f(t') is a disjoined distribution. Thrrc are no values for f(t') in the 




OTtASHPOllVATiaSB OF PEARSON TYPE III DISTRIBUTION 


187 


interval, 



< i r < 0, since — <i i < op . 
as 


The transformed distribution is 


thus composed of two sections, each with itB own mode. The section for negative 
values of t\ with range - ® < (' < , has a mode given by (31) with the 


negative sign before the radical. The section for positive values of i\ with range 
0 < t’ < ®, has a mode given by (31) with the positive sign before the radical. 

When m is an even integer, if we assign to /(t') the value 0 when l f « 0 , fit 1 ) 
becomes a continuous unimodal distribution in tho interval 0 < i' < «-, with 
tho mode given by (31), with the positive sign before the radical. 


Ill. Logarithmic Transformations • 

As indicated in tho introduction, numerous studies have been made of the 
distributions obtained by replacing normally distributed variates by exponential 
functions of the variates. If a variate x t with range - « < x < ®, is dis¬ 
tributed normally with mean aero and unit variance, then by replacing x by s', 
where s' = c + c 6 * tho range of z* becomes, c < x' < °o. Likewise if a variate x 
is distributed in accord with a Type III law, with range 0 ^ x < « , then, in 
making the above transformation, the range of x* becomes (c + 1 ) < s' < 

Wc shall now study the properties of tike distribution of %' obtained by tho 
above transformation applied to distribution (1). Because of the similarity of 
the properties of tho transformed frequency distributions, we shall take k * 1 
and c = 0 . 

Letting x ( - e*, (1) is transformed into 

(32) m - v, dog *r , “V~ (v+u , i s *' < ». 

Then, 

(38) =/(x')ix'loga')- , (( 7 it-l) -(7+ 1)1<#*'|. 

The derivative changes signs at 

■y*— l 

(34) s' = 

The arithmetic mean of the b"b distributed in accord with (32) is given by 

s' = yo f (log z^ t ^ l jcr y dt- 



188 


A. 0. 0L9HBN 


The integral is divergent when 0 < 7 < 1> hence for these cases we take 
0 < Jb < y, then 1 


(36) 



Likewise in order that the first s moments about the origin be finite when 
0 < 7 < s, we must have 0 < fc < 

s 

The median of the distribution of g"s corresponds to 


hence 

(37) 


logs'-* — 


/ 

•fad. 


e *v. 


yt-l l_ 

1 . The relative positions of the averages. We have « rH < « since 
Hence 

7 +1 ey 

(38) 


fao. ^ Sand, * 


Also, 


7 * 108 (fh)" 4 [ a+ h + b + " ■] ? f ' £• 


Therefore 


and hence 

(39) 



fad. ^ ^■ 


From (38) and (39), we have 

(40) fao. <! fad. ^ $*\ 

We shall now investigate the locations of the various averages as related to 
the upper and lower points of inflection whose abscissas will be denoted by /« 
and /! respectively. Taking the second derivative, 

60 fW) (a/ log ic'P® |(y3 - - 2 ) - (V* 4- 3yX’ -2^-3) log x r 


+ (y + lK* J- rtW ’ 



TRANSFORMATIONS OP PEARSON TYPE III DISTRIBUTION 


180 


The points of inflection are given by 

« 

(41) *' - e* <T,a , 

where 

( (y* - Dffr + 3) ±V(y$ - 1)1(7* - 1) + *<7 + I)(y+2)T 

2(7 + l)(7 + 2) 

(a). To show Xna, > l \. 

We have, 

h* - D(2y + 3) - (yx - l)(p +1) 

7 + 1 ^ 2(r+l)(7 + 2) ' 

where 

g +1 - |/lsince2y + 4>2y + 2-p, 

Therefore 

e TiT > 


(b). To show aL, < f u ♦ 

We have, 

■Y*-l - (yi - 1)(2t + 8) + jyt - l)(p + 1) 

T + l 2( 7 + 1 )(v + 2) 


since 2y-M<2y + 4 + p. 
Therefore, 


y-i 

fiT+i < 


From (a) and (b), we have 

(42) l\ < *L </!.. 

(c). To show a£rf. > /I. 

We have, 

XjaA, ^ 3gno, find i^gp, fi < 

Consequently 

®mdi ^ -f| • 


(d). To show the conditions under which aid. is less than or greater than l [. 
Upon simplifying the inequality we find that e*' h,,#> will be greater or less 
£ 

than c*" c t according as the expression 


(43) -2f’ + i^(3 + (7 - 3)} + j( 2 + ®) - c ^( 3 + |) ~ 2 - 


1 

i 



190 


A. C. OLBHEN 


is positive or negative. But (43) will be negative for all values of £ if its dis¬ 
criminant is negative or aero, tfpon further examination it can be seen that the 
discriminant will be negative or zero according as 


(44) 


•f- *0 -S) + -!( o+ i)- 7 *«• 


The quadratic equation in 7 , given by (44), factors into 

where 

X.«(!-!), B- |/»(l -i)‘- *(• + 0 + 38, i)>A. 

Hence in order that x^, be. greater than /i for all values of £, wc must have 



When c lies in the neighborhood of 3 ,7 must be leas than 5 . Proceeding further, 
we can divide (43) by negatives 2 , reverse the inequalities, and factor the expres¬ 
sion into 


where 


Then 


MK s+ 5 ) + fc- 3) } ■ 
B ' “ 1/ "' ,+8 {K 2 + "?) - 

B' < A'. ■ 


(46) kLl > V, if i < ~JL or g > 

4 4 

and 

(48) 

(e). To show It < f\ 


■ :t A'-B'^A' + B' 

AihI. K I u if — a ^ ——■r—- * 

4 4 


* 



TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 


101 


Wo have, l\ < since *' > x mA . > i, and xL. > l!. Also, l' u < £' for 
those values of y and £ for which I* H < , It remains to be shown that S’ is 

always greater than /l for all values of y and X. To show that 


»V>u(Y. 


> *. 


“■ < (yhf- 

it will be sufficient to show that <p u (y, £) < $, since 

^fe) = *( 1 + £ + 3 ? + 

The inequality is satisfied if 

(y$ - 1) | far* - 1) ■+ 4( r + l)(r + 2)}< | as(r + 1)(7 +2) - fr* - l)(2y + 3) }\ 
This expression, however, reduces to the condition that we must have 

-2 - 7 * - 0 $ ~ &yfl S - 7*35 - &E* < 0 , 
which iB always true. Hence 


< feh)' 


and we have 

(47) 


iL <tf. 


2 . Contact at the ends of the range. Wo shall now investigate whether the 
frequency function, f(x r ), has high contact with the x 1 axis. Tho function, 
fix 1 ), vanishes at both limits, and thus it will be sufficient to test tho derivatives. 
The nth derivative can be expressed in the form of log s')””" multiplied 

by an nth degree polynomial in log It can easily be shown that each deriva¬ 
tive will vanish at the upper limit, while the .nth derivative will vanish at the 
lower limit provided y$ > n, Therefore, /(s') docs not have high contact at the 
lower end of the range. 


3. MomentB, The ath moment about the origin is given by 

r dog *')’*-* 


(48) 


»jf •+* 

(a)’ * 




7 > s * 


If 7 < s, ihen taking k such that y > sk > 0 , we get in place of (48), 



m 


A. C. OliBHEN 


We easily obtain the remitting relationship 
( 60 ) 

The «th moment of %* about the mean is 

**y*J I*' - < > log ®0 7 *~ l s Ml41) dx' 

- » f {** - (“TT ***** ^. 

If we do not take the value of h to he 1, then 


IV, Transformation into a Normal Distribution 

We shall now consider a unimodal probability function y - f(%) with range, 
a £ $ <b, and shall seek to express x as such a function of t as will transform 

y «p /(*), into y as Ce V. For simplicity, we assume that y *p /(a) has its 

modal value at a <= 0, and thus each of the curves y = f(s) and y » Ce~ T has its 
one maximum value at the origin. 

In y =s f(x) let log y « V, then equating densities, 12 we have, 


Then 


F-logC + j*»0. 


5 + ‘=o, 


( 63 ) 


{FT 

dt* 


+ 1 = 0, 


cCV 

-gr - °i " ^ 3 - 


If /(a) Is s probability f utiction or density of a distribution, then J(v) dx is, to within 
inftniteBim&lB of higher ordor, the probability that a valuo taken at random will fall into 
the Interval <te at as, 



TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 


193 


Under the assumption that a? is a function which can be expanded in a Maclaurin 
series in powera of t, we shall use equations (63) to determine the vaiueB of 

1 * *L 

-r- m the aeries 
dt n Jw> 

(54) m ~ *f A\t -4- Aa jT. • 


d* V\ dV 

LetVnrepresent ^-J^n - 0.1,2*.., Thenv 0 = log C, andui a 0,since~ 

end van ^ when % s= 0, that is when t =» 0, since da is taken to b0 zero, 
Taking the second derivative, 


and 


Therefore we have 
(65) 

Also, 


tfV fVfdtf, <Px(dV\ 
dP m dx* \dt) + (fl s Vfc/* ~ X> 

■ 

*r\ , 

tJm “ * A ' “ - L 


Ai « (-m) *, when ra < 0. 


d*y1 . 

- tjj - = ViAi + ZiiiAiAi » 0 , 
111 Jf-0 


and we have 

(68) 

Similarly, 


a JS. 
A * “ 3v|' 


A 6ii-3fofi2, ^ 

4 ‘-i W ~ { '^' 

(67) Ai *= -( 40 oS - + flgjtt) 

4 46pJ 1 

A _ - |386»J - 330»avSv4 + 21u| (Spjiij + 6eJ) - 24vJ«t) (-ut) 1 

f B “ 144^ * 


Though the procedure is - straightforward, the work becomes somewhat 
involved in determine A„ ns n goto larger. For this reason, we proceed in the 




104 


A. C. OlAtfEtf 


following manner. By the use of Burmann's 14 theorem we can write (54) in 
the form 


(58) 

But 




Ji>Y\ t. 

\ix\t/j*j>2\ \(W\T/I-«3I T 


f-^Oog C-V)-', 
where V is a function of g, We have, 

* c - 7 L- £-° at 

and we may write 

V I / I .*?! ^.**1 

y ” k * c+ a^L2i + d3.L,3l + 

Hence, 

j =* ^ («*»* + to® 4 + m® 4 + • * •)"* 

■» ^ (fla 4- fl*® + m®* + • • 0~\ where a„ m -jjj* 
We can now write, 

©■-MsM 4 

But 


A n =□(jjj =» (n —1)1 multiplied by the coefficient of g*^ 1 in (60). 


Hence, 

ri«-(tt-DIGtoa) 


-> y (-.)(-i-■)(-!-»)■■■ (-;->+■) 

( 60 ) " ’ """" " X,IX,IX,IX,I 

©w-(?r 

where the summation is over all values of X 4 such that 

jl, « 

and p«zjV 


A"1 


11 A. Be Morgan, Differential and Integral Calculus, (1812) page 906. 



TRANSFORMATIONS OF PEARSON TYPE III DISTRIBUTION 


105 


This expression can be written in the form, 


n n— 2 


(61) 


A. = (»-l)l(2<h)'5 2(-l) rt, 5. . 

•-o 2 4 


hi n n + 2 n + 4 


6 

74 + 28 D n_, ' , 0S )1 


2(a + l) o{+i 

where D is the derivative operator of Arbogast. 14 
If we take expression (1) as our funotion of x, vre obtain, 


7t > 1. 


(-D’ 


- 1V-1 1 


<7* " 1) 


which gives 


_ Cr« — l) . >*-!)» ■ 2 t l t i t 

1 n I ■ jfc i\i Ql I JC../.I 1 1\ J 


(62) 


b 21 6r(y* - l) 1 31 46?(yl -1) fl 


"b nTT 


367 ( 7 ^ — T)^ 51 


^7 + 


where A, ■» (»—1) I —— - - multiplied by the coefficient of • 

2V 

[-nr - 

This series is known to diverge for large values of t. However, tho series is 

defined for those values of (that correspond to® for the interval 0 <x < 2^8— ij, 

With the aid of (22) and Salvosa’a 16 tables wo give in Table IV tlie percentage of 
the total population which is included in this interval 


TABLE IV 


The -percentage of tho population, characterized by (0, which is included in Uw interval 
0 < s < 0(2 — t/y), for different degrees of «/ceii'n«8i 



1,1 i 

1 1,0 

.9 

,8 | .7 

.6 1 

! .5 j 

_ - _ 

'.4 

.» 

.2 

1 .1 
l-_. 

Percent of 
Population 

79.380, 

m 

80.781 

mm 


■ 



100.000 



Thus, in dealing with samples as large as 10,000, with moderate degrees of 
skewness, the probability of getting a value that falls beyond this interval 


u Augustus Do Morgan, On Arbogast’s Formulae of Expansion, Cambridge and Dublin 
Mathematical Journal, Vol, 1, (1848) pp. 238-286. 

9 Cf. SalvosA, loo. olt. page 2. 
















196 


A. a OlSftSN 


I 


becomes negligible, Bence ft may be expected that with the use of a compara¬ 
tively few terms of series (62) we may transform the ordinates of a moderately 
skew Type III function to within close approximations of the ordinates of the 
normal function. 

Baker 16 considered the transformations of a non-normal frequency distribution 
represented by where the origin is taken at some central point and the 
scale is the standard deviation, of the distribution. By equating probabilities 
he found a function such that by setting t «= <p(u), he obtained 

/[?(«)]. <p\u) du *= «'*“* du. 

It seemed of interest to compare the results obtained by applying transformation 

(62) , which is found by equating densities, to the illustration treated by 
Baker, 17 where the transformation giving equality of probabilities was used. 

The example treated woa 

(63) fit) = .9929 (l +^)“ e~'“. 

This is & Typs III distribution of form (24), with «a s* .2, From (22), 
4 

100, and from (62) we obtain the series 

■ 

(64) * - — (l + .loowasu 4 - ,0033670b 1 + .0000282b 1 ~ .OOQOOOV + .. 

We shall utilise only the first four terms and rewrite (64) in the form 
ft m 09(1 4 .1008038b + .0033670b 1 + .0000282b 1 ). 

However, from (23), 

m + f|*) t 

which gives 

ft m 7 *(l + ^ 100(1 + .1(). 

Therefore, 


i - .!(?& ^ 100). 


11 G. A. Baker, Transformation of Non-normal frequency Distributions into Normal 
Distributions, Annals of Mathematical Statistics, Yol, 6, (1934) pp. 113-133. 

17 Baker, loo, oit. page 117. 1 



TRANSFORMATIONS OP PEARSON TYPE III DISTRIBUTION 


197 


With the aid of Salvosa'e u tables, we obtain the following results. . 

TABLE V 


Comparison of the ordinates of the normal function the function with skewness.^ 

and the skewed function transformed by 
t = 9.9 (-.0101010 +J0060S8u +.0039670u* +.000028211*) 


u 

Normal Curve 

Function, with 
skewness .2 

Transformed 
skew curve 

Transformed 
skew curve 
minus normal 

-2.0 

.053091 

.049243 

.054226 

.000236 

-1.8 

.078050 

.076810 

.070201 


-1.0 

.110021 

.112956 

.111303 


-1.4 

.140727 

.167043 

.160350 


-1.2 

.191186 

.206951 

• .106003 


-1.0 

.241071 

.269120 

.242986 

.001015 

- .8 

.289692 

.308968 

.290005 


- .0 

.333225 

.361638 

.384618 


- .4 

.368270 

.382463 

.360811 


- .2 

.391043 

.398683 

.392678 . 

.001035 

0 

.398942 

.398610 

.400615 

.001673 

■2 1 

.391043 

.383167 

.392682 

.001630 

.4 i 

.368270 

.364646 

.369811 

.001641 

.6 i 

.333226 

.316273 

.334621 

.001396 

.8 i 

.289692 

.272360 

.200006 


1.0 

.241971 

.226714 

.242084 

■S& : y;3Xll 

1.2 

.194186 

.182841 

.104000 

.000823 

1.4 

.149727 

.142663 

.160363 


1.0 

.110921 

.107939 

.111383 


1.8 

.078960 

.079364 

.070277 


2.0 

.063991 

.066702 

.084214 

mBM 


The ordinates of the transformed distribution arc more symmetrical and 
approximate the ordinates of the normal curve more closely than the values 
obtained by Baker even though we have used only four terms in the transforming 
series. 

Returning to the general cose, we may write 



tt Salvoaa, loo, eit. pp. 04 etseq. 










108 


A. C. OifiHEN 


provided the series converges for all values of t. Under tho assumption that the 
integrand satisfies conditions for the term by term integration of the Bcries, 
we get 

(80) 1 - j (\d* - C + ||-‘+ ••• + ^Sj 1 + ...j. 

The area to the right of the modal ordinate is 

f ydx = C f e 5 + Ait + ttt "h • • dt 

= i + c jT •* (a. < + Aji+...^, + ...)« 

(67) =i+C^,H-^*+...J. 

Hence the area from tho mode to the median is 

(08) cYil, + ^i + 

Let us consider distribution (1) again. The coefficients in series (62) are 
functions of the skewness, and become smaller with smaller degrees of skewness. 
Indications are that with moderate skewness, the series converges sufficiently 
to be used for certain formal purposes. If wo assume this and proceed in a 
formal manner we obtain some interesting results that are consistent with 
approximations that have been obtained elsewhere. 

Thus, it is interesting to note that using the coefficients of series (62) in equa 
tion (66), wo obtain 

r(7») - vVy* -1) <7* - lr- 1 ***-" 

I 1 + 12(7* - 1) + 288(1* - 1)* + ’' '}* 

which is Stirling's asymptotic form for r( 7 $), w 
From (68), the area from the mode to the median in the Type III distribution 
characterized by (1) is approximately 

<69) ' °(h ~ 1367 ( 7 * - 1 ) + ' ‘ ’)’ 

where 

„ _ y(yi - 

rfr*) • 


'* E. Cauber, Wahraohelnliohkoitaroolumng, Volume 1, (1008) pp, 23-24. 



TRANSFORMATIONS OF P&ARBON TYPE III DISTRIBUTION 


100 


Since ( 7 * - 1) is large when the BkowncSB is moderate, and since the terms of (69) 
aro rapidly decreasing, the area from the mode to the median is approximately 
2C 1 . 

equal to —. But - is the distance from the mode to the mean and C is tin 4 

ordinate at the mode, hence the area from the modo to the median is approxi¬ 
mately equal to the ordinate at the mode multiplied by 2/3 of the distance 
between the mode and mean. Therefore with moderate skewness the median is 
approximately 2/3 of the distance between the mode and mean, which conforms 
to the approximate result first obtained by Karl Poavson * 11 for the Type III 
distribution. We may, for all coses resulting in (08), take A 2 as being approxi¬ 
mately equal to the distance from the mode to the median. This becomes 
somewhat more apparent by finding the arithmetic mean of distribution y. 
Thus, 


f 

t J* 


xydx 


J'Vdat 


fill + A*l + A a ^ + A*|^ + ' 


ill 


C V 2 ir^Ai + Y\ * If + * ‘ *) 


a 4 _l/3A-iAj ^ AnAa 

a°^Htt+tt 


■)+ 3 (tt 


t , 5AiA< , 5AiA 


j- 

+ 41 * 


***)+.. . 

2i8i r 


a. 4- 4. 

l + 2 l + 41 + 


(70) 


Jf + 


Remembering that A 0 is the abscissa of the mode, it becomes apparent that the 
mean is, in general, approximately equal to the inode plus 3/2 of the distance 
from the mode to the median. 

. Though scries (62) is known not to converge for large values of t, it ia interest¬ 
ing to note that if we use distribution ( 1 ) for y, wc have from (70) 

(71) "1 “ (* - ^) + 2 Q - 3 ^ - 03! + '''' 


the first two terms of which give which is mi > and hence if (71) were an exaot 
formula, the sum of the terms beyond the second would bo aero. 

For example (63), it can be seen from the following that As furnishes a close ap¬ 
proximation to the distance from the mode to the median. Here, t = ,1 (y 2 — 100 ); 


putting x m aJ — 


- wc have L, 
V 


, 1 (ys - 101) = -.1. Putting 


*° Karl Pearson, loo. cit. 



200 


A. C. 0LBHEN 


X » 

median 


HK 


where At * , we have as our approximation to the 



Interpolating in the Salvoaa tables, we find for aj =* ,2, t m a. =* —.03331 approxi¬ 
mately. Hence it is seen that the interpolated values cheeks very closely with 
that obtained by using the At criterion. 

We shall now consider briefly the transforming series, when for y, we take 
distribution (4). Then, corresponding to (54), we obtain the scries 

m *' (rft - »)” j_ »(rjj -n) H , ■ nfofl - !)(?& - ft )*" 1 t 
K } 7 * f 37 " 21 

n(6n* - An + l)(yg - ft )*" 1 1 
fry* 3! 


n(45n a - 90n* + 45n - 4 )(72 - w)"'' 8 1 * , 
46y» 4! + 


When n * 1, (72) reduces to the aeries given by (62). Suppose we are primarily 
interested in the cases for which 0 < » < 1. For these cases the coefficients of 
(72) decrease more rapidly than do those of series (62). Under tho same 
assumptions as to convergence which were made in working with the latter 
series wo have, from (68) and (72), the area from the mode to tho median given 
approximately by 


(73) C 


in-2 


[fl(3n - 1 X 73 ! - n) 11 " 1 L n(45w 8 - flOn 1 +45 a - 4 X 7 # - n)' A 


3y 


186t* 




When 0 <£ n < I wc always have > n; then A t > 0 if n > 1/3, and At < 0 
if n < 1/3. Therefore, if da is taken to be approximately equal to the distance 
from the mode to the median, we have xU > ffld. if n < 1/3, and aw. < aid, 
if n > 1/3, since A% is positive or negative according as n is greater or less than 
1/3, Combining these results with (70), we have £ > x^ t if i, < w, and 
£ < i. if Xm, > i which are tho relations given in Section Ha, for cose II. 



A TEST OP THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN 
MEANS OF SAMPLES FROM TWO NORMAL POPULATIONS 
WITHOUT ASSUMING EQUAL VARIANCES 1 


By Daisy M. Statikey 


1. History of the problem. If the only available evidence about two normally 
distributed populations is contained in two samples, one from each, it has 
hitherto been the custom to the test the hypothesis that the means are equal by 

assuming that the quantity * s distributed In Student’s distribution, 


with N + N 1 - 2 degrees of freedom, where 


2(x - x)* 
N(N - 1) 


and ft « 


»the other notation being that UBed by R. A, Fisher,® The 

hypothesis underlying this test, however, is that the variances are equal. Al¬ 
though in many coses this may seem a reasonable assumption to adopt concur¬ 
rently with that of equal means, it is undoubtedly not a necessary one, and it is, 
therefore, desirable that the test should be adapted to meet this difficulty. 

The first advance on the problem was made by W. V. Behrens 8 who suggested 
that the distribution of the difference of the means could be expressed in terms 
of the observations in the samples from the two populations, the argument 
being entirely independent of the variances, 11. A. Fished obtained substan¬ 
tially the same result, but expressed the argument in terms of fiducial probability, 
M. S. Bartlett 5 was of the opinion that Behrens’ reasoning was incorrect, ns he 
obtained some results which were apparently inconsistent with those tabulated 
in Behrens 1 paper, but R. A. Fisher 1 showed that Bartlett’s argument 
was open to criticism. In the latter work, he actually obtained distributions 
for the case of two samples of two observations, and in the following wo shall 
indicate some extensions of this more detailed work of Fisher, firstly, to the case 


Presented at the joint meeting of the American Mathematical Boolety and the Insti¬ 
tute of Mathematical Statistics, Indianapolis, December 30,1037. 

Research done under a grant-in-aid from the Carnegie Corporation of New York City. 

* jfflafistoal A/eJAods /or Resource JYorterg, 1026-1030. 

3 "Ein Beitrag zur Fchlorbarechnuiig bei wenige Bcobaohtungcn/' Land®, Jb, 08,807-37 
( 1020 ). 

* "The Fiducial Argument in Statistical Inference/’ Annals of fiugomcB, Q (1036) 
pp, 301-8. 

1 "The Information Available in Small Samples/ 1 Proc, Camb, Phil, & foe. 92, pp. 6fl0-8 
(1030). 

* "On a Point Raised by M, S. Bartlett on Fiducial Probability," Awwle of tfugeaicB 7. 
Part IV, 370-6 (1037). 


201 



202 


DAISY lit. STAAKBY 


of other small samples of even numbers of observations, end, secondly, to samples 
of very large numbers. 

% The case of small samples. Wo recapitulate, briefly, the preliminary 
.argument of R. A. Fisher 4 , in which he denotes the unknown population means 

■b# 

by n and p'. Since *—- ■ t, where t is distributed in Student's distribution, 

we may write n =■ ® - si, and obtain the fiducial distribution of the population 
parameter n in the form Qi(p) df* y where 



and a similar result for the fiducial distribution of n f , The simultaneous fiducial 
distribution of fi and y! is thus Gifo) Ot{pf)dtidiif from whiah the fiducial distri¬ 
bution of /* - a' may be found. We may note that the characteristic function 
of -* (ja *- n 1 ) + {$ - &) ib M{st)t where 

MOt) - jT r 1 ft&Oft w in in' 

where 



with a similar expression for Ha{0« Thus from the fiducial point of view, the 
problem is essentially that of formally determining the distribution of the 
variate to - fV, or at + bt\ where n*«,h - 9* are regarded as constants, t 
and t f being distributed in “Student's” distribution. The hypothesis ju * y! 
may then be examined by testing whether & - # is a significantly large value of 
this variate. We shall approach this distribution problem through the use of 
characteristic functions. 

By definition, the characteristic function of “Student's” distribution is 
represented by the integral 


0 ) 


1 rft(n + D] r e ff * 

Vtn T(in) y t 


and may be evaluated by three methods whioh will be briefly considered. 


■ 



TEST OF DIFFERENCE BETWEEN MEANS 


203 


First, by integrating the function 


1 Win ± 1 )] 

Vim r(in) 



around a standard semicircular contour in the upper half of the z-plano, the 
value of the characteristic function is at once proved to be 2 ti times the sum 
of the residues of the integrand within the contour when the radius of the semi¬ 
circle becomes infinite. Within the contour there is one pole only at t = i Vn- 
The residue at this pole is the coefficient of 1 fh in the expansion of 


1 

V*n 


T[Un + 1 )] 

r(l») 




m*+w 


in ascending powers of h, which may readily be evaluated when n is odd. 
Second, by using the result that 




from which we deduce that 


1 

ir 



Va 


Differentiating this result (n — 1) times with respect to a, again considering odd 
values of n, we have that 



By forming the first order differential equation in y » — and diff erentint- 

V& 

ing it 4(n — 3) times using Leibnits* theorem, we may obtain a linear relation 
between the derivatives of order — 1) and lower; similarly, by differentiating 
\(n — 6) times, we may obtain a linear relation between the derivatives of all 
orders up to and including £(n — 3), and continuing in this way, we obtain a set 
of ^(n — 1) linear equations in the $(n — 1) unknown derivatives* These 
equations may be Bolved for the derivative of order $(n — 1) by the determinant 
rule. The denominator determinant is independent of x, and the numerator is 
• e -,l!v/ * multiplied by a polynomial of degree $(n — 1) In x. Using this fact, we 
may specify undetermined values for the coefficients in this polynomial, and 
obtain, relations between these values for two consecutive values of n by differ- 



204 


DAISY M> STABKBY 


entiating onoe, The recurrence relatione thus obtained may be used to verify 
by mathematical induction the following relation, after substituting a => y/n } 

1 Hi(» +1)] r f M 


e 


rWVS 


i \ x \vh i <ww (»- *)a.ivw (»-*) i 

L 1+ 1 *' Vn + ~2l— {T=Yf + —8T~ (¥^5J' ■ ■} 


the coefficient of (| * | i/w) w being 

1 (n — 4fc + l)(n — 4fc + 3)(n - 4fc + 5) • • *(n— 2ft - I) 
(2ft) 1 (n - 2)(n - 4)(n - 0) • • - (n - 2fc) ’ 

and the coefficient of (| * | VS)* +I being 


1 (ft - 4ft - l)(n - 4fe + - 2fe - 3) 

(2fc + l)l (u - 2)(n — 4) ■ • * (n — 2fcj ' 

This is, therefore, the value of the characteristics function, and is the same in 
form as the result which may be obtained by the first method. There are 
evidently a finite number of terms, the degree of the polynomial being }(n — 1). 

Third, the characteristic function may be shown to satisfy the second order 
differential equation. 


By change of variables y « (we assume that x is positive, as it may be 
replaced by ita absolute value in the integral) and u * % Vn, we obtain 


Bong the I'robenius method of solution in series, we obtain as one solution when 
n iB odd the expression 


l + « + ^ 


u- (n - 2) , u 1 (n - 5) 


+ TTi 


21 (n — 2) 1 31 (n ~ 2) 


and the corresponding value of y has already been proved to be the value of the 
characteristic function. It is probable that the corresponding solution of the 
differential equation would also be the value of the characteristic function when 
ft is even. In this case, however, the indioial equation has roots differing by an 
integer, and the solution of the differential equation is mueh more complicated 
in form. Nevertheless, it Beams possible to find a series expansion for the 
characteristic function of “Student's” distribution in this way whatever be the 
value of n. 

The characteristic functions of the distributions of at and &' may now be 



TEST OF DIFFERENCE BETWEEN MEANS 


205 


readily obtained by replacing a by a® and bs in the above expression. Multi¬ 
plying the characteristic functions of these two independent distributions, we 
obtain the characteristic function of the distribution of at + bt\ wliioh is of the 
form 

-«" l,l(l ‘ IVi+IH ' flU U + MOalVn + IMv't?)+ •■•!, 


the term in brackets being a polynomial of degree' ■ ——. We may now 

Ji 

use the result that the distribution iB given by the integral 

i/V-wd*. 

and so obtain the distribution of w = at -f 6l\ 

A distribution so obtained would involve four constants, a, b, n and n', and a 
derived probability table would thuB be very complicated. It may, however, 
be simplified firstly by considering the case of equal sample numbers, and, 
secondly, making the transformation 


(3) 


.. _ + W) 

W+R 


whence the resulting distribution involves only two constants, a, and the 
ratio a/6. In this case the form of the characteristic function is 


(4) s' 




n — 3 


, i r.i./- , (I*IV»)' (a, + t,) ^-2 + 21 

1 + 1*1 V»H jj-(lil+-| &!)." 


ab | 


In determining the form of the distribution, we shall encounter integrals of the 
form 





[ x \ p dx . 


This can be reduced to 

n b r dx + n b f «" Vii+ ''V<ix J 

Jo Jo 

and, integrating by parts, or using the Gamma Function integral, we obtain 
os the value of this integral 

n *’ pl [(V» + «)*' + (V^*pJ • 


Writing v = Vn tan $, this reduces to 



2 cos (p +1) 6 cos 1 * 1 9, 



206 


DAISY At. STARKLY 


The distribution ib thua aeon to be:— 

(6) - [po + piCos2fl + paQ08tfcor3ff + ••• + p n -i cos" -3 0 coa nfl] dQ, 

IT 

where 

i 

(ft + 4') j ~ | + 21 at | 

si -1,f» -1, ft -—a^rviMF —’"" 

It is obvious that the values of the coefficients p may all be expressed in terms 

of the ratio - . Denoting this ratio by r, 
o 

(** + 1) j-S + 

* to “ * 

fr +1)' 

and thua we could construct a table for the probability integral involving n, r and 
v only. 

The process of evaluating the probability integral may be simplified by con¬ 
sidering the term already evaluated, 

n ip J * I'd*. 

Integrating this expression under the integral sign with respect to ti, between 
the limits v and <#, the contribution to the probability integral from this term is 
seen to be 

% q 

which on the introduction of the same transformation os before, gives the value 

- 2 (p - 1)1 cos* 6 sin p9. 

Thus, from (6), the total probability that 6 should lie between \ and a given 

ss 

value, 0, is 

(6) ijj — 0 ~ oos0sin0 - ^coB^einSe - ... - cos^flsinCn - 1)0 j, 
where- 

The following summarises the results for small values of n, 

, , ol + W 

n ^ 1. tan 6 = r- rrTri • 

*+M 


1 . 



TEST OF DIFFERENCE BETWEEN MEANS 


207 


The results reduce to those already given by Fisher. The distribution is then 
simply , or Student's distribution, and is independent of a and b. and the 

IT 

1 6 

probability integral is ~ —. 

* 7f 


2 . 


n =5 3. tan 6 


at + W' 


Va<M + l‘l)‘ 


The distribution funotion is 


dO " 
W L 


1 + cos 2$ + 


2r 


(1 +r)* 


cos S cos 3 9 


]■ 


and the probability integral 


;G— 


3. 


cos 6 sin 0 - 
n = 6, tan 6 n 


(iTH 5 008,9 

of + 5l' 

VbR+M’ 


sin 20^. 


The distribution funotion ia 
j [1 + COS 20 + l 009 0 cos 30 


+ (TT50* C08 ’ 9 008 49 + 3(^T‘ M8 ‘ 9 008 M ]’ 


1 (r 1 + 1 + 3r)_, 


and the probability integral 

10 __ 

3 (1 + r)* 

+ 5m M8,ff8in3J “3lIT^ 0 “ 49ain "} 


- - » - cos i Bin S - ; y J . f- V ; noa’ 6 ain 20 

irL2 


3. Samples of large numbers. The foregoing method is not suitable when 
n and n* are large. In this case we use the asymptotio expansion of “Student's” 
distribution which has been worked out by R. A. Fisher 7 and is of the form, 


* r[i(n±1)1 " 


■VV n IW 


ifl 

2 


(V) 


/V 


±6*“, di(l +5 + § +-+S+•••)- 


7 “The Expansion of 'Student's 1 Distribution In Powers of trV Mitrcn, Voh 0, no. 3 
(1923), pp. 22-25. 



208 


DAISY M, STARKEY 


whore P k is a polynomial of degree 4ft in /, such that 


Pi 


t* - 2t z - 1 


n 3t® - 28t 6 + 30i* + 12( S + 3 
P% **= , etc. 


The development of an asymptotic expansion for the distribution of at -f bt* 
is obtained by combining the asymptotic expansions of t and V. The. theoretical 
justification of the process used makes use of the following lemma:— 

1/ Rk{t) is the reminder after Die first (ft -f 1) terns in the asymptotic expansion 

of “Student's" distribution in descending powers of n, then lim ?h I [ Rk{t) \ dt = 0. 

Ji 

In the proof, the Bymbol n lim ,J will be used to denote the limit as n tends to 
infinity of the quantity in question. Let Sk{t) represent the sum of the first 
(ft + 1) terms of the above expansion. It may readily be shown that if 
0 < I < 4, 


and hence that 


limn*/ f(l)dt = 0, 

Jn » 

lim n k j t |df «0. 


Using an expansion for the logarithm of 

Vm 




3 and the asymptotic 


expansion for the logarithm of the Gamma function, the following asymptotic 
expansion may be obtained, log /(<) =* - ^ log 2s- — where 


( 8 ) 






Gtp+t being a polynomial of degree 2p+2, and 

«t ij 


\Rp\< 


irn 


(p + 2)n fii * (p + 2)n™ } 


where 0 < a < 1, 


I ^l-M I 


\K\< 


2 P *A 


(p +ljn ** 1 

02’ +i A 


(p + l)(p-f 2)7i fl+1 (p + 1) (p + 2)w +1 J 


wbereO < p < 1, 


A being a constant independent of ». 



TEBT OF DIFFERENCE BETWEEN MEANS 


200 


Thus, using Taylor's expansion, we obtain 

M/.\ 1 -Ji* (y , . til 1 . IP 1 , , , «A +1 (A 

/J vs? i 1+ " + 2i + si + - + iEf + (rn)'i e ) 

where 0 < 6 < 1 , 

Evidently is of the form 

fo\ _i__ L-*** (frj± J_ ?*+? A _ 4 * frM . >\1 

0 L W +i+ »*+» + + + (jt+i)i /J* 

the quantities q being polynomials in t, 

Using the moments of the normal distribution, it may readily be Bhown that 


»m«‘ [''* -L e-*’ (%| + + ... + 


dl ~ 0. 


In the range of integration, when »is sufficiently large, it is evident that 

H —ir~ l*»l 


i , „ 2n" 2n" , , 

lw|< T + lii' + - + n> 


Thus 


w* +l 


V»+l)l 

and hence 






+ l®i] + |r P fi|-0(n ,w ) if 0 < a < i. 


Kifi P gl 

e" 1 ' 11 , where K and K f arc constants, ' 


i-U’iKM-Jc 


lim^l |A(0|*<lim iF ^ ai aI I=,, -O ( <grpft’. 

We con also deduce the following results: 

1. Since the value of the integrand is unaltered if t is replaced by ~f, we have 
at once 

Hm n k J | Rh{t) | dt * 0. 

2 . Using both of these results it follows that 

lim n k f | Jfo(t) | dt * 0 . 


Hence 

3, 


lim n k £ | Rt(t) \ dl « 0 , 


where t and f have any real values, and thus it is legitimate to integrate the 
asymptotic expansion of /(<) term by term with respect to t between any given 
limits. 



210 


DAISY Mi BTAHKEY 


4. If $(1) ia a function independent of n which is bounded for all values of t, 
the asymptotic expansion of in terms of n may bo integrated term by 
term with respect to t. In particular, if 0(0 = c <l * } an asymptotio expansion 
for the characteristic function of “Student’s 5 * distribution may be obtained. 

We may now consider the form of the distribution of at + bt ( , and in order 
to simplify the argument, the following reasoning applies to the case in which 
the sample numbers are equal, although a similar theory may be developed for 
sample numbers which are of equal orders of magnitude. We may write 

m = + a*®, 

m - Skin + Ui')> 


u * at + U't 


and hence f' — U . The joint distribution of u and t is therefore 
b 

[&<o& (-4^) + B *® s> (^rn) +s * (i)s * (nrO 


The distribution of u is obtained by integrating this expression with respect to t 
over all the possible values of t between — w and +*• The product 

Sk(t)8k gives the first h + 1 terras in the asymptotic expansion which 

is the product of the asymptotic expansions of /(t) and / and a re¬ 

mainder 0(0, where 


( 10 ) 


^(0 




)Vjl + A + 

Vn* 1 T «** T 



t>i, Vi , • • ■ Vh being polynomials in t. Let 


ffiCt) - *(# + &<8& (~^j + R„ S„{t) + R k (t)Ri (^^)- 

Using the expressions for the moments of the normal distribution, it may be 
shown that J \ 0(1) \ dt — 0 , Let the upper bound of the bounded 

function &(<) for alt values of n and t be B> Then 

£ | S„ (4^) j <Ji c B jT I S.(0 1 * 



TEST OP DIFFERENCE BETWEEN MEANS 


211 


Similarly 


and 


Thus 


« 



-0 


and hence the distribution of u may be obtained by integrating the asymptotic 
expansion which is the product of the asymptotic expansions of j{t) and/^-~^ 
term by term. 

In practice, it is convenient to find the distribution of 


(ID 


w ■= 


at + it' 

W+T’' 


We substitute V ■ - * 1 * ■ 

Va> + V 

distribution of w is given by 


and, using the above result, it follows that the 

i 


dw M f" -|p> L . 1 f(«® + ty) 4 _ n (aw + by) 9 
JL. \ A ,4n L (« 8 + 6 s )* ^ a' + b* 

, (bv> - ay) 4 _n(bw- ay) 3 

+ + W (o* + 6*) 



•} 


dy 


which is equal to 

dw -Jib* L , 1 Tw 4 (a 4 + b 4 ) + 12w a aV + 3(a 4 b 4 ) 

(12) v ' 5 * 1 B L (*+«* 


It may be notioed that this distribution may be expressed in terms of the 
ratio a/b only. The probability integral may. readily be obtained, There is 
no theoretical difficulty involved in obtaining any desired number of terms of 
this expansion, but they rapidly become too complicated to handle with any 
ease. Moreover, it is difficult to find a limit of the error committed in using 
any given number of terms of the series for the probability integral as an 
approximation to the value of this integral, as the somewhat complicated 
method of obtaining the series masks the form of the remainder. While it is 
undoubtedly true that when n is very large the distribution approaches nor¬ 
mality, and for a somewhat lower range of values of n the first two terms of 



DA1BV W. BTAEK.BY 


212 




the expansion should bo taken, etc., it is difficult to forecast the number of 
terms which should be retained for any given value of ft. In fact the same 
difficulty seems to exist when using the original asymptotic expansion of 
''Student's" distribution for tire calculation of probabilities. For instance, the 
coefficients of the powers of t which occur in the sixth term of the asymptotic 
expansion of the probability integral are larger than those occurring in the 
fifth term, and, in consequence, in spite of the greater power of ft in the denomi¬ 
nator, for certain values of ft these may contribute more to the probability 
integral than the previous term. We arc unable to any anything about the 
aggregate of succeeding terms in general, and, therefore, it does not seem 
legitimate to drop all the terms following a term which yields a contribution 
beyond the limit of accuracy desired. This difficulty is even more apparent 
In the ease in which the coefficients of the various powers of t occurring in the 
terns beyond the first involve also the ratio a/b, and it is probable that the 
different values of this ratio which are possible would lead to different numbers 
of terms being taken for the same value of n in order to gain the same degree 
of precision in the probability integral. 


4. The distributions of the test quantities which correspond to (3) and (11) 
for equal means, when the ratio of the variances is a known quantity. When 
the ratio 4> of the variances is given, the foregoing arguments, which are inde¬ 
pendent of the parameters specifying the distribution, may no longer be applied, 
for this would be information not supplied by tbo sample. In this ease, the 
distributions of the test quantities which have been used take forms which 
depend only on the ratio of the variances, and we independent of the sample 
estimates of the variances. 


The quantity (3), used in $2, when n was a small odd number, was » v, 

A y 8 

where s s *= and n = JV - 1. On the assumption 

of equal population means, the distribution of this quantity tokos the form 



Thus in the case » — I, we obtain 


> r v^_' . 

x(i+v)*L^ + i>^ r 


_L O 

^ + i) 4 j 


which is the result given by H. A. Fisher,* The integral may be evaluated in 
terms of elementary functions for small odd integral values of ft. 

S „ jW 

In §3, (U), the distribution of the statistic w *= was considered 

Va 4 + s' s 



TEST OF DIFFERENCE BETWEEN MEANS 


213 


whan N was large. The exact distribution may be proved to be 


(14) 


1.J 


rfr + iMi+^r 

+ «(i + Sprw 


dvt'F 




w(l - 4?) \ 

+ ^)y 


where F ifl the hypergeoraetric function. If = 1, we have the limiting case 
in which the argument of the hypergeonietric function is zero, and obtain 
"Student’s” distribution, which is to be expeoted in view of the evidence stated 
in §1, the numbers in the samples being equal. 


Columbia Univebbity. 



SOME EFFICIENT MEASURES OF RELATIVE DISPERSION 1 

By Nilan Norris 

For some time it has been known that the coefficient of variation (in tho sense 
of the ratio of the standard deviation to the arithmetic mean) is not an efficient 
statistic for distributions departing materially from normality . 8 At various 
times there have been proposed certain supplementary estimates of relative 
variation, such as those involving ratios between sums and differences of upper, 
and lower quartiles, and ratios of moan deviations to medians or to arithmetic 
means. Some of these have appeared in certain textbooks . 9 But there appears 
to have been no attempt to found their use on considerations of minimum 
sampling variance. 

The point of departure of this paper is that of using the Method of Maximum 
Likelihood to derive two efficient measures of relative dispersion, together with 
expressions for their standard errors. These optimum estimates of true or 
parametric variation are the ratio of the arithmetic mean to the geometric 
mean (the arithmetic-geometric ratio) for Pearson Type III distributions, 
and the ratio of the geometric moon to the harmonic mean (the geometric- 
harmonic ratio) for Pearson Typo V distributions. The usefulness of these 
measures is suggested by the goneralized-mean-value-funotion approach to the 
analysis of averages, especially the theorem of inequalities among averages . 4 


1 Presented before a joint meeting of the American Statistical Association and the 
Institute of Mathematical Statistics at Chicago, Illinois on December 28,1935. 

1 The term "efficient statistic" Is used hero in the sense of R. A. Fisher, that la, of a 
parameter-estimate which tends towards normality of distribution with tho least possible 
etandard deviation. For a discussion of the inefficiency of certain commonly used statistics 
as applied to distributions departing from normality, see R. A. Fisher, "On the Mathemati¬ 
cal Foundations of Theoretical StatisticsPhilosophical Transactions of the Royal Society 
of London, Series A, Vol. 222,1022, pp, 332-336, 

1 See, for example, William Vernon Lovitt and Henry F, Holtzclaw, Statistics (Prentiae- 
Hall, Inc., New York, 1020), p. 134; Herbert Arkin and Raymond R. Colton, Statistical 
Methods (Dames and Noblo, Ino., New York, 1036), revised ed„ p, 41; and Herbert Soren¬ 
son, Statistics for Students in Psychology and Education (McGraw-Hill Book Company, 
Inc., New York, 1036), pp. 163/. 

1 Milan Norris, "Inequalities among Averages," Annals of Mathematical Statistics, 
Vol. VI, No. 1, March, 1035, pp. 27-20; and "Convexity Properties of Generalized Mean 
Value Functions," Annak of Uathmalieal Statistics, Vol, VIII, No, 2, Juno, 1937, pp, 118- 
120, Profossor John B. Canning appears to havo been the first to point out the possibility 
of making use of certain ratio measures of relative variation. See "Tho Inoome Concept 
and Certain of Its Applications," Papers and Proceedings of the Eleventh Annual Conference 
of the Pacific Coast Economic Association (Edwards Brothers, Ann Arbor, 1033), p. 64, 

214 



EFFICIENT MEASURES OF RELATIVE DISPERSION 


215 


This theorem states that if h < U> then $(fi) < <f>{ts), where the unit weight or 
simple sample type of generalised mean value function is defined as 

l 


(i) 


*CQ - 


The %i axe restricted to positive real numbers not all equal, but t may take any 
real value. A necessary and sufficient condition that $(- «) - $(t) «= 0 (oo) 
is the excluded trivial case that Xi ** % ... = $ n , When the ** are not all 

equal, the ratios between various pairs of averages os generated by yield 

ratio measures of relative dispersion, the usefulness of which depends, in part, 
on their efficiency as estimates of population-characterizing constants (param¬ 
eters). The arithmetic-geometric ratio may be written * 4> ®md the 

<p(v) (r 


geometric-harmonic ratio may be written 


05 's* l n certain coses it 
n—ll « 


may be of convenience to reverse the order of each of the ratios. The standard 
errors for the two forms which each of the ratios may assume are presented 
below. 

The demonstration that these ratio measures of relative dispersion are 100 % 
efficient statistics for their appropriate distributions, and the derivation of use¬ 
ful expressions for their respective standard errors both may be accomplished by 
the ordinary method of differentiating the logarithm of the likelihood, 

d 

Let digamma ofv = F D ($) = g- log $1, 

d 8 

and trigamma of» ■= Ft(x) =* log *1 

For Pearson Type III distributions, the frequency with which the variate x 
falls into the range dx is given by 

( 2 ) 




The parameter a measures the absoluto dispersion of the distribution, and the 
parameter p determines the general shape of the frequency curve. The relative 
variation may be regarded as a population parameter, 0 , dofined as the ratio 
of the population arithmetic mean to the population geometric mean. Let the 
logarithm of the likelihood for this distribution be represented by L, we have 

(3) L *= -n log pi - n(p + 1) log a + 2 log£< - 

where the summation is taken over the n individuals of the sample. It follows 
that 

(4) J =-■;(? +D+^'i and = + 



216 


NILAN NOHJUS 


'When L is maximized with respect to a by equating to zero the first derivative 
of L with respect to a, we find 

(6) — *■ a(j> +1). 

71 

It aleo follows that 


( 8 ) 


8L 

—- = -nFflCp) - ft log a 4- 2 log *<; 

ap 

#1 „ , v . b % h d*L 

ap* 36 n T P # an Sa8p ** apSa 


n 

i 

a 


When L is maximized with respect to p by equating to zero the first derivative 
of L with xespeot to p, we find 


(7) 


te f > 


M 


PdI ») 


The optimum estimate p of p is therefore found from (5) and (7) to be given by 
the equation 


( 8 ) 


(v + i)r'" w = ^‘/ (n*t) r •= | 


But (p + is the parameter 9, Hence we find the optimum estimate of 

A 

6 to be which can be expressed in terms of the generalized mean value func¬ 
tion as ^. Therefore, for distributions well graduated by a Type HI curve 
the optimum estimate of 0, the ratio of the arithmetic mean to the geometric 
mean, is given by 

If only p is being estimated, (a given) the variance, or square of the standard 

Jr 1 

deviation of p is obtained from ^, and is V(p) ** y , To a first approxi- 

mation, the variance of -x, the estimate of $. is found from the usual relation 

Cf 


between, the variance of a function and the variance of the argument, namely 


to) 

vuwi - p5j^]W 

Since 

do) 


therefore 


(U) 

v(±) _ A F,(P) ' rfeiJ 

\Cr/ r - nFr(p) 



EFFICIENT' MBA6UREB OF RELATIVE DISPERSION 


217 


or the standard error of ~ is the square root of the last expression, if only p is 

being estimated. If it is more convenient to do so, one may reverse the terms 
in the ratio to obtain 



and extract the square root of the last expression to obtain the standard error 



If a and p are being estimated simultaneously, there exists the matrix of 
negative mean values 



from which tho variance of ^ can be computed. 


In fact we have 


(14) V(« 


and consequently 
(15) 


P+ 1 

n[(p + 1 )F r (p) - 1] 


1 



I 



A. ' 

The standard error of g is equal to the square root of the expression in (15), 

if both a and p are being estimated. If the terms in the ratio are reversed, 
one obtains 


(16) 



The square root of the last expression may be taken to derive the standard 
error of —. Since the digamma and the trigamma functions have been tabu¬ 
lated for considerable ranges,* these standard error formulae, and those de¬ 
veloped below for the Type V case should be quite useful. 


1 JPritfsfc Association for the Advancement of Science: Mathematical Tablet (Office of tho 
British Association, London, 1031), Vol. I, pp, 42-51, 



MILAN NOKIUB 


218 


For Pearson Type V distributions, the frequency with which the variate z 
falls into the range dx is given by 


(17) 


df 


pl\ar/ a 


The parameter a measures the absolute dispersion of the distribution, and the 
parameter p determines the general shape of the frequency curve. The relative 
dispersion may be regarded as a population parameter, ti* t defined os the ratio 
of the population geometric mean to the population harmonic mean. Let the 
logarithm of the likelihood for this distribution be represented by. L, Then 


(18) h - - n log pl + »(p + 1) log a - (p + 2) 2 log a?< - «2 ~, 

& 

the summation being taken over the sample of n individuals. It follows that 


( 19 ) 


dL 
0 P 

dp 1 


-^(p) + n log a - 2 log :r<; 
-nP r (p); 


dL 

da 


- (p + 1) - 
a 




n 


-S0> + 1)! 



8 i L = flfy = n 
dpda dadp a 

Let L be maximized with respect to p to derive the geometric mean, and Let L be 
maximized with respect to a to derive <£{— 1), or H, the harmonic mean. It is 

clear that for the Type V distribution, the relative dispersion, as we have defined 

Saw 1 

it, is the population parameter , Therefore, if $(0) = Q *= (nau) B , 


1 


p + r 


and 4>(-1) - H — -—-, it follows, by an argument similar to that used in the 


Ij-i 

n 


ease of a Type III curve, that the geometric-harmonic ratio, is an optimum 

estimate of the parameter 9', for distributions well graduated by the Pearson 
Type V curve. • 

If only p is being estimated, the variance of p is given by V(p) = an( ^ 



efficient measures of relative DISPERSION 


219 


0 . 

or the standard error of ^ is the square root of the last expression, if p alone is . 
being estimated, a being given. If the terms in the ratio are reversed, 

(21) v(g) , P~Ti] 

V } \o) nFr(p) 

U 

and the square root of the last expression yields the standard error of 
If a and p are being estimated simultaneously, there exists the matrix 


( 22 ) 


(23) 

and hence 

(24) 


-*(0) - 

■e (—) 

\dadp/ 


>+ 

1) -- 
a 

MS) - 

■ s (@) 


n 

a 

nF T {p) 

the variance of -r 
A 

can be found. 

In faot 


V(t) 

“ r 

i 





p + J 

1 

1 

W) = - n 

■[* 

(p) - r 
P 

Ti]’ 


H 


V(F). If the terms in the ratio are reversed, 

(26) yg) - .-[*<»- 


the square root of which yields the standard error of 

(7 

Just as the coefficient of variation is an efficient statistic only for distributions 
well graduated by the normal, or Pearson Typo VII curve, bo also the two maxi¬ 
mum likelihood estimates of relative dispersion herein developed are efficient 
only when applied to their appropriate distributions. One may expect to 
obtain an optimum degree of efficiency only when tho arithmetic-geometric 
ratio is used for series well specified by the Type III function, and the geometric- 
harmonic ratio iB used for series welt specified by the Type V function. 

It may be recalled that Karl Pearson proposed tho use of the coefficient of 
variation late in the nineteenth century. 6 Since that time there appears to 
have been some tendency to rely on it as a measure of relative variation, regard- 


* “Regression, Heredity, und Panmixia,” Philosophical Transactions of the Royal Society 
of London, Series A, Vol. 187,1890, p. 277. For materials pertaining to tho PonTSon-Thorn- 
dike controversy resulting from the latter's suggestion' that the ratio of the standard 
deviation to the square root of the arithmetic moan is often n'inoro suitable device than is 



m 


MILAN NOHEIB 


leas of whether or not it extracts from the sample a relatively large amount of 
the pertinent information concerning the parent population/ There ore several 
cases in which the coefficient of variation is not ah optimum estimate of relative 
dispersion. For example, in a comparison of the true or parametric variation 
of the weights of humane of given age levels, the arithmetic-geometric ratio is 
often the appropriate statistic to use, since weights tend to he distributed 
according to the Pearson Type III law. Frequently the distribution of weights 
ia very well graduated by the Typo V function, if the origin is fixed at 0 in 
advance. Although this procedure yields a special two-parameter Type V 
function, the principle of using the geometric-harmonic ratio as an optimum 
estimate of relative dispersion is still valid, Again, in a comparison of the 
relative variation of the personal distribution of wealth and income in certain 
modern countries, the arithmetic-geometric ratio will be found to have & smaller 
sampling variance than that of the coefficient of variation, since the personal 
distribution of wealth and income in these countries tends to be in accordance 
with the Type III law, rather than the normal law. Similarly, the distribution 
of the number of trialB required to obtain r successes of an event having a given 
probability usually follows the Type III function, and requires the use of the 
arithmetic-geometric ratio, if the maximum amount of the relevant information 
is to be extracted from the sample. 

It seems dear that in practice the usefulness of the arithmetic-geometric 
ratio and the gaometric-harmonio ratio will depend on the type of the distribu¬ 
tion with which one is dealing, and on the extent to which added efficiency is 
desired. In certain coses there is doubtless room for some difference of opinion 
as to whether or not the degree of added efficiency achieved by the use of these 
maximum likelihood estimates of relative dispersion will merit departing from 
the use of suoh a time-honored statistio as the coefficient of variation. If one is 
interested in avoiding the assumption of normality implicit in methods cus¬ 
tomarily used in the more general problem of analysis of variance, an alternative 
is the use of ranks. 8 Although the efficiency of these r&nk-oorrelation methods 
is not always 100%, their economy of effort is sometimes a great advantage. 

Hunter Colleqd op thb City of New York. 


the coefficient of venation see Edward L. Thorndike, "Empirical Studies In the Theory of 
Measurement/ 1 Archive# of Psychology (The Scienoe Frees, New York, 1907), Yol. 1, No. 3, 
April, 1007, pp. 0-13; and An Introduction to the Theory of Mental and Social Measurements 
(Tenchevs College, Columbia University, New York, 1013), 2d. ed. pp, 133/., or 1st. ed., 
1004, pp. 102/. See also Helen M. Walker, iSludfes in the History of Statistical Method 
(The Williams and Wilkins Company, Baltimore', 1320), p. 178. 

r Cf. Walter A. Hendrloke and Kate W., Robey, "The Sampling Distribution of the 
Coefficient of Variation," Annals of Mathematical Statistics, Yol. VII, No. 4, December, 
1033, pp. 120-132. 

• Harold Hotelling and Margaret Richards Pabst, "Rank Correlation and Tests of 
Significance Involving No Assumption of Normality, 11 Annals of Mathematical Statistics, 
Vol. VII, No. 1, March, 1036, pp. 29-48. See also Milton Friedman, "The Use of Ranke to 
Avoid the Assumption of Normality Implicit in the Analysis of Variance," Journal of the 
Awencon Sta/tolical Association, Vol. 32, No. 200, December, 1037, pp. 675-701. 



NOTES ON THE DISTRIBUTION OF THE GEOMETRIC MEAN 1 


By Burton H, Camp 

There arc two transformation theorems which apply particularly well to the 
distribution of a product and therefore to the distribution of the geometric 
mean of a sample. Both are implicit in the known theory of the transformation 
of integrals, but it is useful to state them in forms which are especially adapted 
to probability theory. Several examples will be considered in whioh distribu¬ 
tions of the geometric mean will be derived by using these theorems. 

The first theorem may be stated as 

Theorem A; Let the point set q in an N-dimemonal u-space be defined so that 
inqa given function of the u's, F{u lf ui ■ * ■ u*) has the property that 

(1) { < * < 1 + 

■ 

Let q be the elementary volume of the point set q defined as an N-tuple integral 

du\'' • dujf 

taken over q f having a value of order d£. Let 

(2) i =* 1,2,' • • JV 



be continuous and differentiable mnotonic functions of the t‘s with unique inverses 

(3) t,= r-'fa). 


Let r be ihe point set in t-spoce corresponding to q in u-space under the transform - 
H'on (2) with elementary volume given by the integral 


(4) 

If J(t) it defined as 

fog, . 

( 8 ) 


I 


f * / dli 9 * f din i 


^ at a point in q for whiefc F »and if Jar aU points 

aUtf 


dt\ dts t / l \ 

dui'" dus W 


<U4l 


When MU a constant, independent of q, (hen the volume f, is, except for term of 
order (df) s , given by 


(e) ■ m\. 


1 Read at a joint meeting of the American Mathematical Society and the Institute of 
Mathematical Statistics, Indianapolis, December 30,1087. 

221 



222 


BURTON H. CJAMP 


The proof is immediate for we have 





But, by (5), the integral in the Inst line has a value less than qM>d£ t and $ is of 
order dt Therefore f differs from q{J{£)\ by terms of order (d$) a , 

Let us now apply this theorem to a simple ease. The volume of the set g, 
where £ ^ u x + * • * + Uu < £ + d£, ut < a, i => 1, - • ■, N, can easily bo shown 
to be 

q *= C(Na - d£. 

Let Ui m log U, Then it follows from the theorem that 

f = K e\Na - df, 
f being the volume of the point set r, where 

(7) £ ^ log (<i • • ■ fa) < £ + d£. 

By the use of (7) one can now use the geometrical method of finding the proba¬ 
bility distribution of the geometric mean, 

(8) v ■* (ii • • • 

of samples of N from the universe dt, provided that 4> ((0 ■ - • is a 
continuous function of Unfortunately there do not appear to be many such 
functions. One that is of interest is 

4>(t) dt « H u dt, 0 5 f < e\ 

Let jD(f)d£ represent the distribution of £. We have 

2){f) d£ b ^(h) ■ * * 0(f#)dtj • ■ ■ dtff « J •»• tfi)* 1 dt\ • • • dt# 

- fk N (?* * C* i+u *{Na - g)*" 1 dt 
Thence we obtain as the distribution of jc; 

fix) dx » ~ log x)*~ l dx. 

The form of f{$) in the special ease in which e « 0 and ^ is a rectangle has been 
found by other authors/ and is 

fix) dx « Cix h ~ x {q - log x)*~ l tlx. 


1 E.g, see B. Kullbaokf “An application of oharaatnriBtic functions to the distribution 
problem of statistics,” Annali o/ Mcdwinaiwat Stat into, vol. 6 (1054), pp, 203-270. 



NOTES ON GEOMETRIC MEAN 


223 


The second transformation theorem to be used may bo stated og. . 

Theorem Bi Let 0(u)du be the probability element for a given universe and lei 
ike sample (t/i, ttj. • • ■ «w) be taken. Let the statistic ( - 7 ( 1 ^, uj?) have 
the distribution F($)d£. If the transformation (2), satisfying the conditions itnr 
posed on it in Theorem A be applied both to the universe and to the statistic, yielding 
<p(i)dt and $ - g(t h ■*•**) respectively, then the element of distribution of £, as 
obtained from 4>, is, as before, F(£)d£. 

The proof is straight forward, for the distribution of £, as obtained from 0 (u)dw 
is given by 

0(«i) ■ ■ • dui, ■. ■ dUj, 

and, as obtained from it is 

0(h) • ’ ■ 0(tjir)dti, • ■ • dip 

where 5 is the set in iirspace where £ < 7 < £ + d£ and r is the set in t-spacc 
where £ ^ g < £ + d£- It is dear that these two integrals have the aame value 
because of the relation 

Mdu = tm) “ P«) it 




and the unique correspondence between the points of q and r set up by the 
transformation ( 2 ), with its unique inverse (3). 

This theorem is particularly well adapted to the derivation of the distribution 
of the geometric mean because of the simple logarithmic transformation con¬ 
necting the sum and the product of N numbers, and because several distributions 
of the sum are already known, Two of these oases will now be presented, 
Example 1 . Let a; be the geometric mean ( 8 ) of the sample of N from a 
universe with distribution law 


(0) m<n - (i>D. 

Then the distribution of 9 is 


00 ) 


f(x) dx 


#"’(togxr-j. 

*»+‘r(iVp) 


(* > 1 ), 


and it is to be noticed that x has the same type of distribution as t, 

To prove ( 10 ), first let £ *= (iii + * • • + u n )/N, where the u*s are a sample 
from a Type III universe, 


e~* 

0 (w) du ~ du 


(w > 0). 



224 


burton h. camp 


Irwin 8 httB shown that the distribution of £ is 

( 11 ) 




r<JVp) 

Making the transformation u ** log t, we have 


t - log *,) u » 


«>i> 


and F(£)d£ is unchanged. We now obtain f(%)dz by substituting (; « log * in 

(n>. t 

Example 2. If x is the geometric mean (8) of a sample of N from a universe 
whose distribution is 


(12) #(0 at = —^ e~ ■** (“• *) dt, («, l,G> 0). 

tCVSir 

the distribution of x is 

(18) Six) <te » e~ 5* ('" s) dx, (x > 0). 

To prove this, one begins with the arithmetic mean { and the universe, 

^(«) du * — y= e 5* 1 u 51 du. Hero F(%) df » -^£L e“ ¥l 
oy2v cy2 «■ 

Again using u = log f, one obtains f ^ log « to •. ■ , and 

$(f) tU ■= —e *® 1 dt, where O *« a E > 0, 

icvSir 


and F(£) d£ is unchanged. To get (13) one substitutes {■ « log &in F(£) d£. 

Again it follows that the geometric mean has the same distribution as the 
universe except for a change in one of the parameters (c). This frequency curve 
has other interesting features. It was developed by Qalton and MeAlister* by 
quite a different method and was catted the curve of equal facility. They were 
seeking for a distribution <K«) which would have the characteristic that, if t and 
tf were two observations differing from 0 by the same relative amount, (<? - t)/t 
= {l 1 - (?)/(?, they would have equal probabilities. McAlister noted various 
properties of 0 , including the fact that Q was actually its geometric mean, and 
that it was not the same as the mode or the arithmetic moan. Certain properties 
which he did not mention are the following: 

($) If one draws a sample from a universe with the distribution <t> in order to 


* Biomlrifm, vol, 10 (1027), p. 229;see also A. Church, Simutrffea, yel. 18 (1926), p. 386. 
4 Thin distribution o&n also be obtained by the method of A, T. Craig, American Journal 

o/ Mathematics, vol. 51 (1932), p. 383, bat It would be difficult to evaluate hie Integral 
without the substitution whioh would bo suggested if the distribution wore known. 

* Proceeding* o/ the Royal Society, vol, 29 (1870), pp, 305, 307. 



NOTES ON GEOMETRIC MEAN 


225 


determine 0 , the geometric mean of the universe, the maximum likelihood 
solution is 2 , the geometric mean of the sample. 

(n) The model point of the sampling distribution (/) approaches 0 as a limit 
as N becomes infinite. 

(Hi) One can devise a function s of the sample analogous to but different from 
Student's s, and show that x/s has a distribution independent of the parameters 
of <? and c of the universe. To do this it is necessary first to extend the second 
transformation theorem so os to include cases where the number of BtatUticB 
(functions of the sample) being obtained simultaneously is greater than one. 
This is not difficult, but since the analogous tests for significance have been 
developed for the normal universe it would not be particularly useful, for if the 
observations are distributed in accordance with $(t) their logarithms arc dis¬ 
tributed normally, and their logarithms can equally well be used for testing 
significance. 

(w) If one uses the curve of equal facility instead of the normal curve as the 
distribution of biological lengths, then any power of such lengths, in particular 
the third power, which is supposed to be approximately proportional to weights, 
would also be distributed in the same manner, exeopt for a change in the para¬ 
meters. This is a property which the normal curve does not have. It raises 
the question; Con biological lengths be represented by the ourve of equal 
facility? The remainder of this paper will be devoted to a discussion of this 
question and cognate matters. 

The curve of equal facility may be made to approach as a limit the normal 
curve if the origin be moved indefinitely to the left. This is almost intuitively 
evident from a consideration of the hypotheses under which the two curves were 
derived by Galton and McAlister. It iB also indicated by the behavior of the 
lower moments. Let vt refer to the ith moment of ( 12 ) relative to the origin 
of t, m to the corresponding moment relative to the arithmetic mean. It is 
easy to show that 

(14) v, - 0^'°', i - 0 , 1 , •••, h -1 - Oh, where K - e 1 *’, 

(18) M *= <fh\h' - 1 ), in = flV(A* - 3ft 1 + 2 ), 

to - <?V(A" - 4A‘ + 8 h‘ + 3), 

U = to/to' - (ft‘ 2) (A‘ - l) 1 ", 

U - to/A - (A* 1)* + «(*’ - 1)’ + 16(A* - 1)* + 18(A* - 1) + 8, 

From (16) it follows that os h approaches unity a* and a* approach their normal 
values, 0 and 3, respectively. If at the same time as is kept constant, it follows 
from (15) that (f and therefore l become infinite. So the origin is moved an 
infinite distance to the left. 

The question, then, whether the curve of equal facility may be used equally 
well with the normal curve to represent biological lengths depends on whether 
in practical cases the natural choice of origin, which is the position indicated by 



226 


BURTON H. CAMP 


Kero length, is such as to make the two curves practically indistinguishable. 
This is apparently the situation in the case of human statures. For 8585 adult 
males bom in the British Isles* the values of the several constants, obtained by 
so fitting $($) to the observations that the mean and standard deviations agree, • 
are as follows: l = 67.40 in., G « 07.411, a ® 2.50, A = 1.00072, observed 
os■ 0 0126, as for ^ = 0.11; observed a* = 3.149, on for ^ « 3.02. ThuB for the 
curve of equal facility os is further from the observed value than for the normal 
curve, but a* is nearer to its observed value. In both cases the difference is 
unimportant. A graph of both curves 7 would not make it dear to the eye which 
of the two fitted the data better. 

It would be expected that the distribution of the cubes of these statures, being 
roughly proportional to the weights of the men, would not be normally dis¬ 
tributed. This also can be verified easily, for the distribution of (y « $*) from 
4(()dt is •f>(y)dy except that c k replaces c, and <3* replaces <7. So the distribution 
of cubes is: 



If this curve is fitted to the oubes of the statures, «3 = 0.23, and on * 3.21. 
Both are considerably further from their normal values than before. For this 
case the corresponding value of A is 1.0004. It is the closeness of this quantity 
to unity, or in other words the smallness of the coefficient of variation, 100 
tr/l =* 100 (A* - 1) I/B , which determines how dose the ourve is to the normal. 
For the statures o/l = 0.0379. For the oubes of the statures? e/l = 0.209. Its 
values in certain other oases® are: length of forearm 0.05, cheat circumference 
0.08, strength of grip 0.20, visual acuity 0.39. It appears to be evident, there¬ 
fore, that for many types of biometric measurements, especially lengths, which 
we know can be represented well by the normal ourve, the cufve of equal facility 
Ui practically just as good. In a given oase it may fit a little better or a little 
worse. If we wish the distribution of the arithmetic mean as obtained by 
sampling from such data we may find it by supposing the universe normal; 
if we wish the distribution of the geometric mean we may find it by supposing 
the universe of a curve of equal facility. This device of substituting for the 
normal curve another type of curve whtoh is equally good in practical cases, in 
order to find the distribution of a statistic which cannot be found easily for the 
normal curve, may perhaps be useful also for other statistics than the geometric 
mean, 


WflSLiTAN Utuvassm. 


* Q, Udny Yule and M, G. Kendall, An Introduction to the Theory of SMtitioe, London, 
1987, pp. 94, lift, 187, Ifi3, 187. 

T Suoh as on page 187, Yule and Kendall. 

* For the weights of a similar group of men <tf\ * 0.137, and thus the two curves would 
be more nearly alike ii fitted to weights than if fitted to the cubes of these statures. 

‘From a long list with values ranging from 0.0049 to 0.8068, compiled by Raymond 
Pearl, AfaKcai Biometry and Statistics, Philadelphia (1030), pp. 347-9. 



NOTE ON A FORMULA FOR THE MULTIPLE CORRELATION 

COEFFICIENT 

By H, M. Bacon 

There are many useful formulas available for the calculation of the multiple 
correlation coefficient in a fc variable problem. 1 Since it frequently happens 
that the regression equation is the primary object of the st&tistioal analysis, 
the well known formula 

i 

ri.sa...ifc = + ••• +A* *as - - - (*-« rut 

can be used to considerable advantage, While many different demonstrations 
of this formula are perfectly familiar, the one given in this note may prove 
of some interest. 

First let us recapitulate briefly certain facts about the regression coefficients 
and the multiple correlation coefficient, Suppose we have k seta of N numbers 
each; 

Xu Xu ■ • Xls 



X kl 



Let be the mean of the j-th set, and let Xji - X fi ~ We then have k 
sets of N deviations from means, and we shall suppose the following ft; sets to 
be linearly independent; 

®n X\% * ■ Kijv 


ftl #31 ’ ' 


•fel Xhi • • Xkx • 

We shall consider only the regression of the "variable’ 1 Xi upon xt , x% ,. *., . 
Clearly the results obtained can be made to describe the regression of any one 
of the variables upon the other fc - I variables by rearranging the subscripts. 
As usual let-X*, h, • • •, X* have values which will make the sum of squares 

F(Xj, X) i ■ * *, X*) - S(®h — \iXu — Xa Xu h&if 

a minimum. For simplicity we shall omit stating limits of summation and 
understand hereafter that 2 means "sum for i from i = 1 to i = N, ir Necee- 

l For example, see W. J. Kirkhum, “Note on the Derivation of the Multiple Correla¬ 
tion Coefficient", The Annals of Mathematical Statistics, Volume VIII (1937), pp. 68-71. 

227 



228 


H. M. BACON 


sary conditions (which are easily shown to be sufficient) arc that A*, As, • • • i 
must satisfy the equations 


tt) 


dF 

ax* 

af 

9Aj 


—2S(a?i{— AsXsi ^ A*Xjj — * * ♦ — i ■ 0 


“-22(a:j< — A*#a< — Aatfa* Aa^O^ii * 0 


— =* — 2S(®w — — X»£« “ • • • “ ** O' 

OAk 

These equations arc simply the ''normal equations” for determining the re¬ 
gression coefficients. Solving them we obtain 

At *= 

A< *= 


A* * hi*.98„.(jfcH) • 

The equation of regression of t\. on a: a , xa , • * •, is therefore 
* fe|.|i.Mjt£i + bia.u...*$a 4* * • • + i>i*.9a...(A-ijaPjb • 

If we let 




for i = 1, 2, -. •, N then %u — is the residual of the t-th %i . The ooeffi- 
oient of multiple correlation of x\ in terms of xa , x 31 *. *, a?* is defined to be 
the Bimple correlation coefficient of the %'s end it's: 


J'l-aM.-* 


VMiXuY 


In case it is desired to express the x'e in terms of their standard deviations, 
the following equation is used; 


or 

where 


“ ■= 011.14...* — + 01J .*4...*“+ + /Sl*.83...(*-4> ~ 

ffi n o'* ck 


•' ft " fiit-H ■ • 4* 0u>m+ ■ • • 4* £u.aa 


(2) 


j?lS.| 4 ...* 


> .Jb 


21 

«Tl 


jSia.Mx.ii 


tl».24.. 





<n 






multiple correlation coefficient 


229 


and 


*t " -• 


Now if 2 AiB { = 0 , the set of numbers^*, , •«« , A * is said to be orthogonal 

to the set of numbers B x , B a , ■ • • , , Hence the conditions of equations (1) 

may be described by saying that the values of Xj, Xa, ■ • • , Xs must be such 
that the set of residuals — u< is orthogonal to eaoh of the k — 1 sets of 
numbers xu, Xu, • , x ki . But if the set of residuals is orthogonal to each 

of these sets, it is orthogonal to any linear combination of them. Since the 
set of tt's is such a linear combination, we have 

S(iu - u ( )ui = 0 

1 

and hence 

( 3 ) SiCiiii# - 2 u?. 

Since Ui — + hu**..•(*-!)£*( it follows at once 

by multiplying both sides by Xu and summing that 

(4) Sjcifttf = &iM4i*.ft22i«£gi + » 

Writing 

2 a:i {%u — Wffiffjr a 
ExuXu = WcriffiTia 


ZxuPkt - Weitfjbfii, 

noting the relations between the b’s and the 0's expressed in equations (2), 
and observing that we may write 

■ 

.. (Siuifi ) 3 __ (Siuui ) 3 

2 itJ 

1 

because of equation (3), we may therefore rewrite equation (4) as follows 
^ “ 0w.«.*.b — Mriffjru + — No icr»ru + ■ • * 

Stt? <72 «ra 


+ 0 n-*j.. ■(*-« — Not tr* ri ft , 


Now divide both aides by 2asJ t = WcrJ obtaining 


7*1 = 011.al-..t7’K + 0IM4*-<*7'li + • • ■ + 01* •»"»<*-!) r lfc* 

■ 2 xft 2 uJ 


This is the formula which was to be established. 


Stanford University. 









Kv* *!?* . J r4i5£•<••*• ■;•• *•»".*•>*• •. >V ' • rf£'V.v.}iC ■ ■':• ■%- ••-■ 

E-,:: y; •,■»’; 'r; ; r ,.>r/,e tj,• <; '"• ;■■>*>V;'. - j -7."'' -. V' v ?4'^' I 

P .-^:,^>-■ * " > i 

g^r k:; v\- f ]: ’: v ''V",\ T v* - ,.y j ~ v-•■ Ou 

pt,;;,-V y.', ; \ i.';‘ j'.• : ^ ; - 








NOTES ON HOTELLING’S GENERALIZED T 
By P. L, Hbtj 

1. Frequency Distribution When the Hypothesis Tested is Not True 

a. The Problem. Let the simultaneous elementary probability law of the 

k(f -|-1) variables zi and z\ f (t =* 1,2, • *., h; r 1,2, •. *,/) be 

■ 

(1) pOmO - (V®- W+1, |C |“' +0 exp (* - fdfa - ti) + wt], 

where 

v'h = £ tWi, («',y= 1,2, 

C stands for the matrix || c</1| and | C |, the corresponding determinant. It is 
required to find the elementary probability law of the statistic 

r-irrS/im, 

where | V 1 ' | = | ^ | and V'u denotes the eofaotor of the element n / in the matrix 

II»!/ II ■ 

■ The quantity fT is a generalization of “StudentV f considered by Hotelling 
[1]*, It is an appropriate criterion to test the hypothesis, say Ho, that the fc in 
the parent population as given by (1) all vanish. The distribution of T when 
the hypothesis Ho is true has already been obtained by Hotelling. But our 
knowledge of the test is hardly oomplote unless we know also the distribution of 
T when the ft do not all vanish, Indeed, only such a knowledge can enable us to 
control the risk of error of the second kind, i,e. of failure to detect that ff 0 is 
untrue [3,4]. 

b. The Solution. Let H be a k X k non-aingular matrix such that H’CH = /, 
the unit matrix, where W denotes the transposed matrix of H. Let the sets of 
variables (*i, ft, • ■ ■, «*) and (zu , z lr ,. • <, z* r )(r -» 1, 2, • • ■ f) be subject 
to the same coilineation by meanB of H, bo that 

11 *ir, i,, •• ••, 4II - 11 i , i . • •• ■■, i\ 1 ■■ H' (r -1,2, ■ • • ,ti 

where the U and t[ r are the new variables. Let further the quantities n be de* 
fined by 


* RefenboeB are given at the end of the paper. 

231 



232 


I\ h. HSU 


(2) ||?i)(■*» • • ■ if*II =* IIti, T*, • ••,rt||•#'. 

Then, as is easy to verify, the simultaneous distribution of the new variables 
will be given by 

(a) ?»((, o - (v'£r M ' +n <ap[--“ gi«< - rj*+«:<)], 

while the statistic T, as a function of the i , s > retains the original form: 

(4) r-l Vr'tv'M 

1 

where 

Mr (i, 1,2, • * •, fc), 

r*l 

| f/‘' | * | uif |, and tfj / is the oofaetor of the element Utf in the matrix || ||, 

By virtue of (2) we have the following relation between the old and new para¬ 
metric constants; 

A t 

(5) Scyfif#. 

<-1 W 1 

Our problem ib thus reduced to finding the derived distribution of T defined by 
(4) from the parent population given by (3). 

We solve this problem by obtaining an expression for the Laplace integral 
i.e, the mathematical expectation of o~* T for real non-negative 0. A few 
words are perhaps needed to explain the fact that the Laplace transform of an 
elementary probability law determines the latter uniquely except on a null set 
of points. If /(as) is on elementary probability law which vanishes for all nega¬ 
tive as and if 

I 

e r*/(*)40 for 0 > 0, 
then, letting c he any fixed positive constant, we have 

i 

8 )- 

for all fi £ c. We get therefore . 

m*=.f , (h** 0,1, V*) 

the definite integral being obviously finite for all ft > 0, Now a sufficient con¬ 
dition that the set of numbers m* determines the funotion e^fix) uniquely, 
with the exception of a null set at most, is that the latter multiplied by 
be summable (0, a) for some positive h (cf. [6], p. 320). Since this condition is 
trivially satisfied by the function e“*/(&), this function, and therefore /(*) itself, 





notes on hotelling's generalized t 


233 


must be uniquely determined by the m h . In other words, f(x) is uniquely 
determined by its Laplace transform p( 0 ). We now proceed to find the Laplace 
integral «(«'")• 

Introduce the function 


ffft i, 9, a) - (v^5r)*| U' I* exp £ «,>}, + 2i« L Mil 


and write 


/?(«, t\ & t a) - pift Otfft «), 


where all the arguments take real values only. For any functions <p( 0 ) and 
t') let us write 

[ ip(6)d$ ** f '" [ p(9 )dOi • • ■ tffl* 

j JL» J-w 

f +(t, t) m, o 


We have 


| dft 0 J I ?ft i', >,<*) I i» - / juft 0 dft 0 f oft t'.», o) di - 


whence we know that 
(®) 


J .dft <') j Pit - j 49 j P dft V) 

On the light-hand side of (6) we find 

J Jhft t) dft 0 / ffft *. ».«) * = / h{t, 0 dft <0 - 
while for the integral on the right-hand side of (0) we have 

(7) - (v / 25r) -W+!) Oq) oxp j^-1E (J? + 8(109) - T)} J * 

X /1 V | 1 rap [- i ( t (9,9, + 9„) 

where we mean by the la the quantities 
<h/ = 0 for i ^; 

= 1 

In the equation (7) the integral with respect to the U is immediately written 
down as 


(U=l,2, ••-,*) 



234 


P. L. HSU 


(yW exp g £ (t{ - M) J j 

As to the integral with respect to the t[ T , we may evaluate it by the method by 
which Wilks [71 evaluated the moments of the generalised variance. The 
result is 




Making the substitution into (7) wo get, after necessary roduotions, 



(v^)'‘r(W + D) 

T(K/+l-k)f 


i m+ h r ,wli 


whence, noticing that | M* 4- S 4; 


X exp 







(v^rftc/ + 1 )) 



Equation (8) gives the Laplace transform of the elementary probability law, 
p{T), of T. There is no essential difficulty in getting p(T) by inversion directly 
lfrom (8). Nevertheless, it may be of interest to get p(T) indirectly by identi¬ 
fying the right-hand side of (8) with the Laplace transform of another ele¬ 
mentary probability law which is otherwise known- For this purpose consider 
the simultaneous elementary probability law 

?(*, v) - (VS)'^' [-5 £ (■ - ft) 1 - 5 £ |}] 


and let us find the derived distribution of the statistic 



As before, we introduce the function 
0(a, V, 0, a) « W*i(Y h exp |Vg (j£, 0* A? + 

write 


Fix, Vf 6 > a) =■ p(a,y)0(*,y,fl,a) 

and Ascertain that 



NO TUB ON HOTELLING'S GENERALIZED T 


235 


(9) J d(x t y) J FdO « j d$ j Fd(x, y) 

On the left-hand side of (9) we find 

J •■‘'“Pto v)d(h v) - e(<r , " tL ) 




while for the integral on the right-hand side of (9), we have 
j Fd(jxt,y) = (>/Sr)™ Wl+/,> exp 

X J exp {*! 4- 2(*O0, - f<)®(] jcte 

x •>)“■—[4(* s ■os <*> 




i _\—l C/i+/s) 


«p[— 5 ^ <«*«? + SiafeOf)] 


Writing 

( 10 ) 

we get finally 

(ID 


/i = h, /,-/+!-* 


(v^)~*rGCf +1)) 

r(«/+x -*)) 




From the identity of ( 8 ) end (11) we conclude that 7 is distributed exactly 
the seme as L with the appropriate "degrees of freedom” fi and /> given by ( 10 ). 
But the elementary probability law of £ has already been derived by P. C. 
Tang [ 6 ), Using bis result we immediately write down the elementary proba¬ 
bility law of 7: 


( 12 ) p(T) ■= «‘ l £ £ 


ft hi B{\h + hi W) 


^ Jlj—K/l+/«)-* 


where/j and /2 are given by ( 10 ) and 

(13) h = 5 E T* - 5 L entiti 

2 1-1 2 u -1 

in Rooordanoe with ( 6 ), The tables of probability integrals prepared by Tang 
can, of course, be used to suit our purpose. 



F. L. KBIT 


236 

2 , An Optimum Property of the T-Test. To any reader familiar with the 
Neyman-Fcarson theory of testing statistical hypotheses [3, 4], the theorem 
stated below may be of considerable interest. 

Denote by W the k(f + l)*dimensional space of the m and z\ t and let is be any 
region in W which may possibly serve as a critical region for the rejection of the 
hypothesis H *. Lot ns speak of a critical region w as belonging to the class D 
if w satisfies the following condition: 

(14) f p(z, z')d(*, ( z') » 6 + ^ 2 + R 

where e < 1 is a positive constant independent of the ft, cn and the region w, 
or is a constant depending on to only, but not on the ft or , and where R for 
any given set of values of the cj/ is an infinitesimal of at least the third order 
os all the ft tend to zero. 

Theorem, Of all (he regions belonging to the class D, (he particular region 
which gives (he largest possible value to the coqffidenl a in (he equation (14) is the 
region defined by T > T t , where T t is a constant so determined that the probability, 
when all ft vanish, of the observed T being not less than T t is exactly t. 

The significance of the theorem is clear. Every critical region belonging to 
the class D serves as on unbiased exact test of the hypothesis Hq , e being the 
preassigned chance of rejecting St if it is true.- Further, as is seen from (14), 
as the ft start to depart from zero, the increased ohanoe of rejecting H 0 due to its 
falsehood is approximately proportional 1 to the quantity 2c^ft ft, The co¬ 
efficient a therefore measures the power of the critical region w to detect the 
falsehood of St , at least when the departure of the ft from aero is Bmall. Our 
theorem asserts that in this particular sense the T-test is the moBt powerful of its 
kind. 

The method of proof is very much the same as that by which Neyman and 
Pearson proved some of their general theorems concerning unbiased tests. How¬ 
ever, as the present theorem has not yet been contained in their more general 
results, we shall give it a full proof without referring, save in one occasion, to 
these authors,< 

Proof. Write ' 

PH + " Sii (i, j m 1, 2, -., fc) 

(15) • ftfe f) - (UP [ -l ( t W/] 

and denote by po(«) the simultaneous elementary probability law of the variables 
b < } derived from (15). Let Wi be the domain of all possible positions of the point 
(«n, an, • • •, in the + l)-dimensional space. 

We know, although we omit the proof of it, that there is no elementary proba¬ 
bility law of the variables sa other than pt(s) which has the same moments of 
all orders as those derived from pj(«, d). It then follows that if g(s) be any 
aummable function of the «<* and if 



NOTES ON HOTELLING 1 B GENERALIZED T 


237 


(18) f ( n V) f (*Wa) da - 0 

yi^i \W"i / 

for all positive integers r#* or zero, then we must have g(«) s 0 except perhaps 
on a null set of points. 

It follows therefore that the identity 

(17) / ffC«)po(s)rfs * 0 

Jvi 


implies the identity g(a) = 0 provided p(s) does not involve the parameters c {i . 
For, substituting for j?o(s) its expression as given by Wwiiart [81 we Bhall have 

(18) K f g(s)p i (s)d8 m f g ( 9 ) exp [-5 2 Cr/sJ* = 0 

JWi JVFi L J 


where j jS | - 18 tj | and K is some constant. Differentiating (18) Buooessively 
with respect to the c ti and dividing the results by K, we shall regain the equations 
(16). Hence it follows that g(a) m 0 . 

This being established, let w be any region belonging to D and rewrite the 
equation (14), so that 



(V®^ tm, C rH/+1 * jf exp [-^2 tylft - £i)ft - ft) + «') 

k 

= « + 5 2 eqtiti + R 

a 


Setting all the ft to zero in both sides of (10), we have 

( 20 ) . J"Pt(z l z , )d{^ } z , ) = t 

identically in the , Differentiating (10) once with respect to ft and after¬ 
wards setting all the ft to zero, we easily get 

(21) jf Zipnfa z') d(z, «0 "0 (i - 1,2, • • • k) 


for all possible values of the cts. 

Finally, differentiating (10) with respect to ft and then to j*/ and putting all 
ft = 0 in the result we obtain 

J j^E CihZhj ^2 c l^ - P»ft *') *0 m ac <( ft j = 1, 2, • - fc) 

whanoe, renumbering (20) 

* 

• (22) • 2 c tt9*i 55 foi ft j - 1,2, • ■ ■, fe) 

M «1 

in which we denote by 0 ** a + c and 



238 


P, L. IIBTJ 


= j v Mftafo «') db, s') (M- 1,2, ■••,&) 

If wc denote by Q the matrix of order ft formed of the elements , we ace that 

(22) may be written as 


CQC m JSC, 

whence, since C lias its invoice matrix, C~\ 

Q a fiC~ l 


i.c., 

(23) Qu m Pdj'' ft 3 * 1, 2,..., jfc) 

whore c<7' ) denotes the element in the matrix C -1 which corresponds to the 
element Cij in the matrix C, 

Conditions (20), (21) and (23) are necessary for the region to to belong to the 
class D, They are evidently also sufficient, 

Let us evaluate the integrals in (20), (21) and the q f} by first evaluating the 
surface integrals on any surface, say Q(s), on which all the s l} have constant 
values, and then integrating the results with respect to the s</ over a region, 
say , of the bj/ contained in Wi. Thus wo may write (20), (21) and (23) in 
the form 


(24) ( /(«) pj(s) d& 3 e, I fli(s)po(s) do a 0, / wj(«)poW ■ j3cSj 1, l 

Jwi Jui Jv>i 

(i,j- 1,2, 


where 




It is readily verified that the function po{#, z')/M s ) is free from the parameters 
Ca , and consequently so are the functions /(«), $((&)> tf>n(s). Besides, we can 
extend the definition of these funotions in the whole domain Wi by assigning 

I 

them the value sero outside of tlie region <i>*. Doing this we can now write the 
equations (24) as 

I (/(») ~ e)Pfl(a) do » 0, | Qi{s)piU) d8 » 0, 

(26) Jwi Jvi 

/ Iw/(*) *- Yfyl Po (s)d9 *0 (i, j - 1, 2, • • *, k) 

J" i 

1 

where? *=-—0, 



NOTES ON HOTELLING'S GENERALIZED T 


230 


Now all the equations (25) aro of the form (17); consequently, according to 
the already established result and remembering the definitions of the functions 
/(*)» 0 iOO and Puis), we must have 

(26) f po(z, z') dO(s) = tpo(s) 

Job) 

(27) f zip^z, z')d(7(s) = 0 

Job) 

(28) j «*z/po(z, z') iKfo) - 7 «f^(«) 

Job) 


in the whole domain W\. 

Hence the most general region belonging to the class D is constructed as 
follows. On any surface s*/ = const, (i, j = 1 , 2 , . • * k) we take an areal region 
such that it satisfies the equations (20)-(28); wc then allow the &u to vary in the 
whole domain W\. Equations (28) may now bo replaced by 

(28') f (r “ 0 , (U-If 2 , ••■,*>) 

Job) \«U «n/ 


Let us call ity the region defined by T > T ,. Since u*i belongs to the class 
D (cf. ( 12 )), its crass section, say O 0 (s), by any surface 8 u - const, (i, j = 1, 2 , 


... , k) must satisfy the equations (26), (27) and (280* Since 7 = t—= (a + <), 

J + l 

all we have to prove now is that among all the areal regions Q{a) satisfying the 
equations (26), (27) and (28 ; ) it is the region G 0 (s) that gives the largest possible 
value to 7 p»(s). N ow 


( 20 ) 


7Pq(«) 


/ - 
Job) Oil 


Ptt(z, z') d(z t z') 


and, according to a Lemma of Neyman and Pearson, [3, p. 10] the right-hand 
side of (26) will attain its maximum value if £?(s) is defined by an inequality 
of the form 


(30) 


flu »,/-! \8u fy/ f -1 


+ C 


where the a</, h and e are oonstants so determined as to enable the region 0 (t) 
to satisfy the equations (20)-(28). We shall show presently that the region 
(jo(s) is defined by such an inequality. 

The inequality T >T, may be written as 

14/1 1 



240 


I». L. HHU 



where s\j l) denotes the (*, J)th element in the inverse matrix of || (|. The 

region &«(*) is therefore defined by the same inequality (31) in which we regard 
tho Si j as constants. 

IF wo put 

Bit m ^ J ft * 01 C = i (h j = lj 2, ■ ' • ,&) 

in (30) we can easily reduce the inequality (30) into (31). 

The proof is now complete. 

3. Note on Applications of T. It n already known that the T-teat may be 
used for the following purposes (a) and (b); 

(a) Given a fc-vaviate normal Burface 

p(s) - (VST**? 4 ex P [" ^ £ 1 c </(ft - fcXft - ft)] 

with the unknown ft and cn. n observations 

(®n f Xu i ■ • * 1 35jfei)j (1 s ft ft • * *, n) 

having been made, it is required to test the hypothesis that the ft have the par¬ 
ticular values $ for i * 1, 2 ,... •, h. 

Here we use the T-test with 

'm 

at « Vufo — l!)j 1>!/ " £ (*l "" ft)(ftl — ft) 

J-l 

. ft ~ y/n (ft - | 5 ), / = n - i 

whore 

1 ^ 
ft ** * 2 

«/-i. 

(b) Given two fe-variate normal surfaces 

Pi(®) - (VTO'*C I exp’f- ^ £ Cain - f dfat - I/)) 

PiM = (v^)^ exp (- \ E - m)(.Vi ~ m) 

\ 2 t./-i 

where the Cn are common to tho two surfaces while all the ft, ft, ca are un¬ 
known. Samples of «i and ih having been drawn respectively from the two 
populations, to test tho hypothesis that ft = jj f for all i. 

Let the samples be 




MOTES ON HOTELUNQ'fl GENERALIZED T 241 


(®ll } , ■ ■ • | *J»), 


(i » 1, 2, • • ■, i»i) 


(Vik } Vn i • ■ • , yah), 


Let 


1 ^ 

— Zj i 


(A - 1| 2j • * •, «i) 


S<m ^ Xu ‘ <i - L 3. • • ■. *) 


We use the T-test with 


ft " /j/(*< ” ft)j = £ (a« - - 2f) + i (jto - ftXjto - ft) 

ft - /|/(b - w)> / = m + ns - 2 

(hJ * 1,2, ,fe) 

A third application of T, which appears to be novel, is the following: 

(c) Given a (ft + l)-variate normal surface 

p(a) = (v'£r <w ’ 1> D i exp ^ dtfa - ft)fa - ft)j, D» |dq|, 


where the ft and <2^ are all unknown, n observations 

($ 11 1 $»i i ■ ’ * i (2 •= 1, 2, • • ■ ,») 

having been made, to test the hypothesis that all the ft are equal, 

If we put 

Vt - %i - $**w 0 - 1, 2, • • •, ft), 

then we have a fc-variate nonnal surface for the variables y\, 

v(y) « (\/%r)~ i C i exp ^ E cti(y { - *)J 


where tit — ft - &+i (* ■ 1,2, *«■, A). Thus the problem is reduoed to toting 
the hypothesis that yi = 0 for i =» 1, 2, ■ * •, k and therefore belongs to the 
type. (a). Write 


Vu = ®«i — ®hm 


(t « i, 2, • • ■, ft; 1« 1, 2, * •.,») 



242. 


P. L. HBU 


rt 

n <= Vn ft, v'a = L (ff« ~ - ft) 


l-l 


ft ^ Vw > / = n + 1 


{i t j - 1,2, • •« f fc) 


Although there are no simple expressions for the cu , there is one for the param¬ 
eter 2 (tyijt'ty, on which alone the distribution of T depends. We have indeed 


* 1 

L ftflflW - ri 
w-i D 


*11 • * • tfli A+l 


**+1.1 ■ * • O’Jt-fl. *+J ffr+l 1 

fi • ■ • ffc+i 0 0 

| 1 1 0 0 


where 


D - 


*u 




ffft+1,1 ' • ■ ffA+l.A+l 1 
1 ••• 1 0 


where <r^ ib the covariance between and ij. 

Expressing T in terms of the original variables $, we have 

SuSia**' s it m 1 


r _ ± 
D' 


.Si •*« #*+1 0 0 

I 

1 ••• 1 00 


where 


3' = 


«u 


«i,k+i 


Sjt+I 4 l 1 ’ * 1 

1-1 0 


and where 

n i-i 


1 * 

■« - 2 (®« "" “ fy)l 

1-1 


1,8,.»,* + !) 


Therefore T is independent of whioh variable has been taken as the (fc + l)st. 
Univmhbity Oollbob, London. 

REFERENCES 

[1] B. Hotdlung, Ann. Math. Statist., Vol. 2 (1931) pp. 859-378. 






















































GENERALIZATION OF THE INEQUALITY OF MARKOFF 

By A. Wauj 


1. Introduction, Denote by X a random variable and by M f the expected 
value 15 (X - x* f of | X - xz |' for any integer r where denotes a given 
real value. M r is also called the absolute moment of order r about the point ato. 
For any positive number d, denote by P(-d < X - &o < d) the probability 
that | X - % I < d. The inequality of Markoff oan be written aa follows 

(1) . P(~<t<X-<e,<d)21-£ r 


The, inequality (1) is also called, for r s* 2, the inequality of Tchebysoheff. 
The inequality (1) oan be written in the following way: 


PHVM, < x - *, < (Vk) Z1 - 

Substituting in the above inequality s for r and | "~Ml for {we get 

VM, 

(a) PH^E <*-*>< Wifi i 1 - P (^)‘. 


where r and s denote any integers and {denotes an arbitrary positive value, 1 
Substituting in (2) 21b for s and 2 for r, we get the inequality of K. Pearson. 1 

By other substitutions we get the formulae of Lurquin, Cantelli, etc.’ _ 

As is well known, the inequality (1) oannot be improved 4 for d > tfM,, 
That 1 b to say, to every t > 0 a random variable Y can be given such that 


%/f 

* Eiy-^tflX-Sor and P(-d < Y * a* < d) + 

If the absolute moments « E(| X - «o \ i] '),»• *, Mq « S j X - |of 
a random variable X are given (and no further data about X are known), then 
we shall Bay that a* is the “eharp” lorn limit of P(-d < X - & < d) if the 
Mowing two conditions are fulfilled: 

(1) For each random variable Y, for which $ | Y - a* | 11 ** X [ X - a* , 

Pi the inequality P(-d < Y ~ xz < d) > <H holds, 

i 

1 The formula (2) has beau given by A, Guldbergj Cwnpiw Bendas, Paris, Vol, 176, p. 670. 

* Biomelrika, Vol, XII (1918-1019) pp, 284-206- 

*E, Lurquin, Cmptn Rtndw, Paris, Vol. 176, p.,681. Also Cantelli, Rzndineonti folk 
Rials Aeaimia del Lincei, 1916, 

* See for instance, H. v. Mines, WahrseMfdwfekeiiswfetttmp, Leipzig, Vienna, Dratieke, 
1081, p. 36. 


244 



GENERALIZATION OF INEQUALITY OF MARKOFF 245 

% 

(2) To each t > 0, a random variable Y can bo given such that E\Y -z»\ { ' = 
X| X— afio | <r (v » 1, ... ,j) and P(-d <y-3o<(f)<ad*fe. 

In other words, a d is the litnes inferior 6 of the probabilities P(-d < 

Y — xd < d) formed for all random variables Y (or which the vth absolute 
moment about the point xa is equal to the t>th moment of X about the point 
to {v « 1, ,j). 

Problem j The absolute 'moments Mt x , Mt % , ■ ■., Mi, of a random variable 
X are given about the point xo , where ii , ta, • • •, if denote any integers and 
denotes the mommt of order i t (v ~ 1 ... fc), It { 6 required to calculate the “sharp” 
lower limit of the probability P(-d < X - xt < d) for any positive value d. 

If only a single moment M r is given, our problem is already solved, because 
the inequality (1) gives us the “sharp” lower limit for d > ^/W r and for 
d < ^/M, the “sharp” limit is obviously equal to zero. But the case in which 
even two moments M r and M t are given has not yet been solved. The formula 
(2) gives us a limit for P(-d < X - xo < d), but this limit is not "sharp ” 
as oan easily be shown. 

Wo shall give hero some results concerning the general case, and the com¬ 
plete solution if only two moments M, and M, are given. We shall see that 
the "sharp" limit is considerably greater than the limit given by (2). ■ 

% Some Propositions Concerning the General Case* We shall call a random 
variable X non-negative if P(X < 0) <= 0. Sinoo the absolute moments of 
the non-negative random variable Y = | X — | about the origin are equal 

to the absolute moments of X about the point xo and since P{Y < d) = 
P (—d < X — Xo < d), the following proposition holds true: 

(I) Denote by , •. ■, Mi, the absolute moments of order , • * ■, i# of a 
certain random variable X about the point Xo. The limes inferior of the probar 
bitities P{—d <. Y - i» < d) is equal to the limes inferior of the probabilities 
P(Z < if), where P(-d < Y - Xo < d) is fanned for all random variables Y 
for which the i,4h absolute moment about to is equal to AT*, (y * 1, ■ ■ •, j), and 
P(Z < d) is formed for all non^negativc random variables Z for which the irth 
moment about the origin is equal to Mt, (v = 1, • ■ *, j). 

On account of the proposition (I) we can restrict ourselves to the considera¬ 
tion of non-negative random variables and of the momenta about the origin. 

A random variable X for which k different values xi, • • -, Xh exist such 

fc 

that the probability p(*0 of X{ (i - 1, • - -, k) is positive and £ p(*i) * 1, 

ib called an arithmetic random variable of degree k. A random variable X 
will be called ^limited, if P(-f < X < f) = 1 We shall prove tho following, 
proposition. 

(II) . Let us denote by M tli M iti ... , Mi, ike absolute moments of order 
ii , ■ • •, if of a certain norntegative random variable X, about the origin. Denote 
by U(k, 0 the set of off non-negative t-limited arithmetic random variables of 

* The limes inferior of a Bet N of numbers is the greatest value y for which the Inequality 
y S t for each element x of N holtta true. This ie also eiiHed greatest lower bound. 



246 


A. WALD 


degree < kjor which the i,th moment about the origin is equal to M t, (j» ^ 1,,.. t j), 
fl(k, t) is supposed to be not empty* Denote further by a (d, k, i) the limes inferior 
of the probabilities P(Y < d) formed for all random variables Y of the set o(jfc, (), 
Then we can say. There exists in Q(fc, t) a random variable Z for 'which P(Z < d) = 
a(d, k, t). If 0 < a(d, k,t) < land Z is a random variable in G(fc, <) for wkich 
P(Z < d) = o(d, k, 0, then there exist at most j - 1 different positive values 
&i, ■-* such that X\ s* & land the probability p(x t ) of x t is postittie 
’ (4 « 1, 2,... ,j - 1). 

At first we shall prove that there exists a random variable Z in Q(k, t) such 
that P(Z < d) *= o(d, fe, t), Since a{$, k, i) is the limes inferior of P(V < d) 
formed for all random variables Y in Q (k t f), there exists in &(k } () a sequence 
[Zf] (i - 1,2, • ■ •, ad inf.) of random variables, such that lim P{Z\ <d) - 

a(d i A, /), Arranged in ascending order of magnitude, the values of Z( which 
have a positive probability are denoted by xu , x t ,*, • • ■, x itbi . Since Z f is a 
^limited non-negative arithmetic random variable of degree < k, we have 
hi £ k and 0 £ x ilT £ t (r = 1, * • •, fcf). It follows easily from this fact that 
there exists a subsequence {Z*,| (p = 1, 2, • •• , ad inf.) of (Z<j with two 
properties: First, that the variables Zi x , Z (t , ■ • • are of the same degree 
(say e), that is to say ki„ ~ s (v - 1, 2, • * •, ad inf,); and second, that the 
sequence (r - 1, 2>.. •, ad inf,) converges for each integer r £ a. 
Let us denote lim Xt f , r by s, (r - 1, 2, -..,«), and the probability that Z if 
takes the value xi fl , by pi f , r ■ It is obvious that there exists a subsequence 
(Z„,l (v 53 1, 2, ■»* ad inf.) of [Zt w ] such that the sequence {p n ,,r) converges 
with increasing v. Let us denote lim p n , lT by p r , (r =1, 2, • - • a), Since 

Q 

I 

P< r .i + ■* • + pi„r = lj 2 Pr 1 must hold true. We consider the random 

r-1 ^ 

variable Z for which the probability that Z = x r is equal to p T (r = 1, 2, - - • s) 
and for whioh no values except ®i, «»•, s r are possible. , The random variable 
Z is obviously an element of Q(k, t) and P(Z < d) = a(d, k, t). 

Let us consider the case in which 0 < a(d, k,i) <1 and denote by Z a random 
variable of Q(fc, <) for which P{Z < d) = a(d t fe, i). We shall prove that there 
exist at most j - 1 different positive values x\, ■ ■ • , Xj~i such that v* d, 
%i ■£ t and the probability p(xt) of a:< is positive (i = 1, 2, - ■ • ,; — 1), In 
order to prove this statement we shall suppose that there exist j different posi¬ 
tive points xi ,... suchthats* r* ^ iandp(«{) > 0 (i = 1,2,... ,j). 
Then we can write 

f 


= Mij - E ^p{x), 

T"*l 


•This 1 b certainly the case, if we choose k and l groat enough. 




GENERALIZATION OP INEQUALITY OP MARKOFF 


m 


where the summations on the right hand sides ace to be taken over all values 
of x which are different from Xi, ■ ■ ♦,and for whioh p(x) > 0. 

Since P(Z < d) « o(d, ft, f) and 0 < a{d, k, <) < 1 by hypothesis, there 
exist two non-negative values b and c, such that b < d, c > d, p(b) > 0, and 
p(c) > 0. 

We define anew arithmetic random variable V bb follows: p'(b) - p(b) - c, 
p'(c) m p(c) -|- «, and for all other values p'(%) = p(z), where p’{x) denotes 
the probability that Z* — x t and e a positive number less than p(b), Z 1 is 
obviously a non-negative arithmetic variable of the some degree as Z. The 
moments about the origin of the order t‘i, ta , • * •, i f of Z f will in general not • 
be equal to the corresponding moments of Z. However this can be obtained 
by a small displacement of the points xi, •.., x f into a system of neighboring 
points £, ■ • •, £) t provided that e is small enough, In order to show this, 
we have only to prove that the functional determinant 


j/(*0 • ■ ■ p'(®/) 


ii& i*” 1 , • • • | if£ J* -1 

of the functions/i(£i, ' * •, f/) ® ^ , ■ * • = (a») 

*—X 

does not vanish at the point £\ = a* . Since p'fo), p'fa), • • • 

p r (Xj) are not equal to sero, we have only to show that 


4 l '\ •■•i*} 1 " 1 ! 


■”} x'r 1 

83 


Ji-fi <i 

i * 1 ■, , 

Jj-(| JH) 

a?r . ' ■ • Xy 


xi'~ l • • • x} 1 " 1 5* 0 


where is - ii, ••• ,if - h oan be assumed positive by denoting by ii the 
smallest of the integers ii, U, ■ - *, it . 

Let us consider the polynomial in x given by 


R(x) = 


1 

4 Hl , 


According to a well-known algebraic proposition,’ the number of positive roots 
of R(x) is less than or equal to the number of changes of sign in the sequence of 
coefficients of R(x)i Since the number of changes of sign in J2(®) is obviously 
Less than or equal to 3 - 1, the number of positive roots of fZ(») is also less 
than or equal to j — 1. On the other hand x » Xi , • • ■, $ = Xf- i are j — 1 
positive roots of R{x). Hence for any positive value x xi, ^ xt, • * * j 


I * 4 







248 


A. WALD 


1 


^ tho polynomial J?(®) does not vanish. Thus R{xf) and therefore also 
A* and A are not equal to zoro. 

Let ub denote by Z* the random variable which we get from Z 1 by a small 
displacement of the points xi, • • • , xj into a system of neighboring points 
such that tho moment of order i, of Z* about the origin becomes 
equal to M ir (y = 1, 2,... , j). By choosing « small enough we can obtain 
the values ft , • •. , x j as near to Xi > • ■ •, Zj as wo like, In particular, e can be 
chosen so small that ft, ■ • ■ are positive numbers less than t, and ft > rf 

or < (i accordingly ns xi > or < d Then Z* is obviously an element of 

Q(fc, {). But for Z* 

P(Z* < d) = P{Z' < d) * P{Z < d) - e - a(d t h\ t) - e 

holds true, which is a contradiction because a(d, h, t) is the limes inferior of 

P(7 < d) formed for all random variables 7 contained in C2(h, 0* Hence cur 
assumption that thorn exist j different positive numbers ft , ♦.. , x/, for which 
Xi ,?* and p($t) > 0 (i * 1, 2, • - • , j), cannot be true, and the propo¬ 

sition II is proved in all its parts. 

It follows from the proposition II that a(d, ft, t) is indopendent of h. On 
account of this fact and of the fact that any random variable X can be arbi¬ 
trarily well approximated by arithmetic random variables, we get the prop- ’ 
oaition: 

III, Let us denote by Mj,, • • *, Mt } (lie moments about the origin of order 
ii, • • •, i/ o/ a certain rmmogaim random variable. Denote by Sl(l) the set 
of all non-negative i-limiled random variables, for which ike iAh moment about 
the origin is equal to Mi, (r = 1, • ■ •, j). Denote further by a[d, t) the limes 
. inferior of the probabilities P{Y < d) formed for aU random variables Y con¬ 
tained in ft(t). Then we can say\ There exists in B(0 a random Variable Z for 
which P(Z < d) = a(d, 0* If0< a(d, t) < 1 and Z is a fandom variable/or 
which P(Z < d) =5 a(d, t), (hen there exist at most j — 1 different positive numbers 
Xi ,. • *, i, such that Xi rt d, a* ^ t, and ihe probability that Z = X{, is posi¬ 
tive (i * 1,2, • •• ,j - 1) 

It is obvious that a(d, t) decreases monotonioally with increasing f. Hence 
lim a(d, 0 exists and it can be easily shown that: 

a(d, 0 converges towards aj if t -» co, 

3. Solution of the Problem if Only Two Moments are Given# Let us denote 
, by M r and M, tho absolute moments respectively of order r and s about the 
, point Xu of a certain random variable X , where r and s (r < s) denote any 
integers, 

Let iis first consider the ease 


Mr 

dr ~ d' 


It follows from (1) that 



GENERALIZATION op INEQUALITY OP MARKOFF 


249 




M . M 

We shall show that a* — 1 — — r if -^ < 1, For this purpose let ub consider 
the arithmetic random variable Y b of degree 3 defined as follows: 

pW * 1 - p(&> + d) - pfo + d + 6) 

where c is a positive number and p(w) denotes the probability for Ft •= it. 
The r-th moment about sen of Yb is obviously equal to M f . On account of (a) 
the s-th moment of Y b about xo is less than or equal to M t for b - 0. On the 
other hand the s-fch moment of Y b about go will be greater than if 6 is suffi¬ 
ciently large. Hence there exists a non-negative value to such that the s-th 
moment of Fj, 0 is equal to M,. 

Since Pi-4 < r„ - „ < d) - 1 ■- $ + J - J (j^)' <1 ■- £+ \ 
and since e can be chosen arbitrarily small, we have 


Jlf 

If — r > 1, then ad is equal to zero, because a* decreases monotonipally with 

decreasing d and Gd -» 0 for d » . 

We have now to consider the case 


» 

First we shall show that 


Hr. M. 

&' d* 


(3) 



In fact, if — were > 1, then making use of (p) we have > , 

• 4 

and hence (Mr)' > M t . But this is not possible, because according to the 

I 

well-known inequalities between moments, (M r ) r is less than or equal to M ,, 
It follows from (3) and (p) that 


(4) 



In order to calculate Od, we shall apply the propositions found in section 2. 
On account of the proposition 1, ad ib equal to the limes inferior of P{Y < d) 



250 


A, WALD 


whore P{Y < d) is formed for all non-negative random variables Y fot which 
the r-th moment about the origin is equal to U t and the s-th moment about the 
origin is equal to M *. Hence we can restnot ourselves to the consideration of 
non-negative random variables and of the moments about the origin. 

We shall show that 0 < o(<J, t) holds for any positive value f. In order to 
prove this, it is sufficient to show that a d > 0 since t) > a d . It follows 

from the inequality (1) that aj > 1 - -p Since, according to (3), ~~ < 1, 

we have a* > 0, and therefore also 

( 5 ) a(d, i) > 0 

f 

Let us see whether a(d, t) < L If M, = (AL) r > then, as is well-known, only 

a single non-negative random variable X exists for which the r-th moment 

• 

about the origin is equal to M t and the s-th moment is equal to (Af r )', namely 
the arithmetic random variable X of degree 1 for which the probability that 
X « y/Wr is equal to 1. Since ^/U T < d, as can be seen from (3), we have 
P(X < d) = 1, and therefore Oj = 1. Hence in this case our problem is 
already solved and we have to consider only the alternative: 

t 

(0) M t = jfcf' + <7V>0) 

We Bhall Bhow that a(d, <) < 1 for t > <Ju r + dr. For this purpose let us . 
consider the non-negative arithmetic random variable Y t of the degree 3 defined 
as follows: 

p«/W,) -1 - t,v<i) 

pto)~i- P «/m-p(Q 

where p(it) denotes the probability for Y t u, and«is a positive number < 1. 
The r-th moment of Y t is equal to 

ItfPd/M) + Cp(0 - M r . 

The s-th moment of F, is given by the expression 

A. = M'.p(VF f ) + tp(fy =. (1 - «)Afj + A' 

On account of (6), A is less than M, for e = 0. For * * 1 we have 

A » r r M t > d- T M r . 

Since from (0) we have A > Jf.for«« L ‘ Hence there exists 

a positive value « < 1 for which A - M t . Thus the r-th moment of F <0 is 
equal to Jf r and the «-th moment of F, c is equal to Af,, We have 



GENERALIZATION GF INEQUALITY OF MARKOFF 


261 


P{Y„ < <f) = p(0) + = « - «y' + 1 -« - 1 -. y' < 1. 

Henoe 


(7) a(d, 0 < 1. 

On account of (6) and (7) it follows from proposition III, that there exists a 
non-negative arithmetic random variable X belonging to the eat Q(J) such that 
P(X < d) - a[d, 0 and there exists at .most one positive value 5*0 
with positive probability. Hence o(d, 0 is equal to the limes inferior of the 
probabilities P{Y < d) formed for all non-negative arithmetic random variables 
Y which have the following two properties; 

(1) The r-tb moment about the origin is equal to M r and the e-th moment 
about the origin is equal to M, 

(2) Thera exists at most a single positive value 8{Ad, 5*0 with positive 
probability, 

Denote by Z a non-negative {-limited random variable with the properties 
(1), (2), and for which P(Z < d) *= o(d, <), The following equations hold 

p(Q) + pity + p(d) + p{t) - 1 

(8) p(W + p(d)d r + p{t)t' = Mr 

pity? + p(d)d' + p{tyt « Af. 


where p(u) demotes the probability that Z ~ u. 
From the lost two equations of (8), we get 


(9) 

( 10 ) 


p w 


MJ-’ -U, + p(i) ie - t’dT'] 
- a j-r ) 



u, - r r M r ± pit) ytr - g 


l 


Since ~ ~ and i > d } the numerator in (0) is positive. Since 0 <; pffl < 1, 

w a* 

the inequality 

(11) 0 < a < d 
must hold. Hence 

(12) Pity > 0. 

Wo shall Bhow that p(t) = 0 if t is sufficiently large. For to purpose let 
us make the assumption p(l) !> 0. We define a new random variable Z 1 os 
follows: 


p'(i) « p(0 — «where 0 < t < p(l) 



262 


A. WALD 


and 


p'(d) ®5 p(d) 


€ d r (d*~ r — a»“ r ) 


m - P(A - 


«y - fiQ 

® r (d*” f — i ,_r ) 


p'(o) = l - p'(a) - p'(d)-p'(0 


p'(s) =* 0 for all values z 9* 0, s *d, i*t. 

, p\u) denotes 1 the probability that 1* u. 

The equations (S) remain satisfied if we substitute p'(0), p'(£), p'(d),,and 
p'(() for p(0), p(fl), p(d), and p(4) respectively. Hence the r-th moment of Z' 
is equal to M r and the s-th moment is equal to M ,, We have to show that Z' 
is in fact a random variable, that is to Bay, that the defined probabilities are 
>0 and <!1. It is euffloient to show that the defined probabilities are non- 
negative, because the sum of them is equal to 1 and therefore they must be <1. 

Obviously p'(Q is >0, Since t > d and according to (11) d > $, we have 
p'(d) > p(d) > 0. According to (12), p(5) is positive. Hence for t sufficiently 
small p'(S) is also positive. We have to show that also p'(0) > 0. p'(0) is 
given by 


p'(0) « 1 - m - m - p'(l) 

-1 - ><*) - - p(Q +. [i + + ftpr-fz j] 


p(0) + 


dV{<r - o ± - q - f - g) 

d r 5 r (d*-* - $‘~ r ) 


Since p(0) > 0, « > 0, d > i and s > r, tjxia last explosion is positive if t is 
sufficiently large. We may assume t so great that p'(0) > 0, because we want ’ 
to calculate only 


Od 


lim o(d, t). 


Now we shall show that 


P\d) + v'(t) > p(d) + p(f). 

In fact 

+ P'(‘> - P® ~ - x ] 

r i r ta ~ r ~ j*" r * 

“'Iyjr'-r T ~ 1 j >a 



GENERALIZATION 07 INEQUALITY OP MARKOFF 


253 


Then 


P'(0) + p'ffl < p(0) + p(fi) - a(d, 0 


must follow< Since p'(0) d- p'(£) = P(2' < d), we have a contradiction and 
therefore the assumption p(t) > 0 is reduced to an absurdity. Hence p(0 
must be equal to zero and a(d, t) aj . If wo substitute zero for p(t) in (8), 
(9), and (10) we obtain: 


(13) 

(14) 


P(0) + p(fi) + p(d) - 1 
< p(S)S r + p(d)d r - M r 

fW + pW * M. 



Mrd* -r - M, 
S r (d<~ r - «*-'') 




M, - M^' 
d r (d»- r -fi‘“ f )* 


We shall prove that p(0) = 0, For this purpose let us make the assumption 
p(0) > 0. Denote by 3 t a positive number <6 and let us consider the arith¬ 
metic random variable Z* of degree 3 defined os follows: 



MjT_ - M, 
slid 1 -' - &n 


■— r 


p (} “ dy-' - an 

p # (0) K3 1 ~ p'(jl) - p'(d). 


The r-th moment of Z f iB evidently equal to Af, and the a-th moment to Mi♦ 
Since p(fi) > 0 according to (12), and p(0) > 0 by hypothesis, p'(0) and p'(fit) 
will be greater than aero if 5» is sufficiently near to fi. The derivative of p'(d) 
with respect to fii is given by 


1 - M r (s ~ r)*r T - 1 (<?- T - sn -Ha- - MJT) 

i tf^-fin 1 

fcjince the above expression is negative. Hence p'(d) decreases 

m d r d § 

with increasing fit. Sinoe fit < fi, we have 

p f (d) > p(d) > 0 


and therefore 



254 


A. WALD 


1 - p'(d) < 1 - p(d) = a *. 

Since 1 - p'(d) *= P(Z r < <f), we have a contradiction and the assumption 
p(0) > 0 is proved an absurdity. Hence p(Q) *= 0, and p($) + p(d) = 1, 
From (13), (14) and (16) we have 


Hence 

(16) 


MX - MA r + MX - MX . , 
*w +PW *»-8- 1 ’ 

Utf - MA' + i’(U, - f) + »*(<f - Mr) * a 


The equation (10) in S has at most two positive roots, because the derivative 
of the left hand side of (10) 

rC\M> -<0 + aT 1 ^ - Mr) 

has exactly one positive root in 3. Since 5 « d is a root of (16), the value of fi 
which we are seeking must be the second positive root of (16), which we shall 
denote by fo • 

It can he easily shown that 5 q <i v*® < d. In faot, for 8 * 0 the left 
hand ride of (16) is positive on account of the assumption (fi) and for £ = y/U r 
it becomes equal to 

M,{M, - f) - MI(M, - <0 = (if. - m!) CATp - <T) 

Since M, > M* r and recalling from (3) that M, < d\ the above expression is 
less than or .equal to 0. Hence fo lies between 0 and -\fM r < d. 

Hence aj is given by the expression 

« /a i M i ~~ Kf8o~ r 

For s as* 2r the root 86 can be easily calculated. We get 

os) ‘- = VT~r 

If we substitute in (17) %r for a and the right hand aide of (18) for 8b, then 
we get 



t {Mr - f)Jf> - MAM* - d r Jfr) 
tf l ‘[d r (Jtf,. ^ d r ) — Mi? + M,d r ] 

<r(jfJ - m„) 

' *[2B r # - i* - Affr] 

, Ml-M* 

2Jf,i' - <P' - if* 



GENERALIZATION OP INEQUALITY OP MARKOPP 


265 


Ii®t us denote the non-negative number M j r — JIf J by <r*, then, we obtain, 1 
^ “ 1 - ^ _ Ur y + </ “ V* ~ *!)• 

Let us compare the ‘'sharp’' limit given by (U) with, the limit pvai by (2). It 
we substitute, in (2), 2 r for s and d for $j/M r we have 



ub a lower limit of the probability P(~d < X < xq < d), We Bee that for 
small values of <r s , fa ia considerably smaller than Qd . 

Our results may bo summarized in the following 

Theorem: Denote by Mr the r-th and by M a the s-th absolute moment of a 
random variable X about the •point xq , when r < s. For any positive value d 
denote by P(-d < X <xq <d) the probability that \ X - stq \ < d. The “shqrp" 
lower limit a* of P{—d < X — xq < d) is defined as Ike limes inferior of the 
;probabilities P(—d < Y — xq < d) formed for all random variables Y for which 
the r-ih moment about Xo is equal to M f and the s-lh moment about xq is equal to 
M t . We have to distinguish two cases, 

I, ~ In (Me ca« < l.arai#, = Otfip > 1. 

jl Af, > Afi In this ceae cm is given by 

d r a* 


(17) 


. _ M, - M,tT 

Mib-t — Itt-'i 


d r {d^ - «r r )' 
where So is the positive root 7*d of the equation 8 in 5 

MA' - M t d r + « f (AT. - d‘) + W - M T ) = 0. 
For 0=2?' we have 


‘ . /Mr - *M r 

fc " y -ti'-ir- 


<®Jf we substitute in (17) 2r for s and the above expression for ft, we obtain 

2 

Gi a* 1 “ 


(d* - Mr)* + <r 4 ’ 


where v ~ M$ r — M \. 


Cowles Commission, Colorado Springs. 


1 Thu ease a - 2r haa been treated also by Cautellt. He demonstrated the formula (10) 
in quite another way, whloh cannot be generalised for the ease s t* 2r. CantolH's formula 
and its demonstration are given in the book of M. Freohot, Generalities sur Probabilities. 
Variables AleaMree, Paris, 1037, pp. 123-126. 

1 As has been Bhown, there exists exactly one positive root r* d of the equation con¬ 
sidered, 



A MODIFICATION OF BAYES* PROBLEM 


By R, v. Mi8eb 

The classical Bayes problem can be stated as follows, We consider an urn 
which contains white and black balls (or balls designated by 0 and 1). The 
probability p for drawing a black ball 1 b unknown. But there is given a proba¬ 
bility function F(x) representing the a priori probability for the inequality 
p i x, We draw n times from the urn (returning each time the extracted ball) 
and got a black ball m times and a white one n - m times. Now, after this 
experiment, we ask for the a posteriori probability P*(®) for the relation p & x t 
The solution proposed by Bayes wi be written in a slightly generalised form: 

(1) PM * If f>U - p)’'"iF(v ). 

where K is a constant to be found by means of the condition 

M Mi) -1. 

We are interested in the behaviour of P„(«) if n tends to » under the condition 


( 2 ) 


n w 
Urn — * a. 
ft 


Laplace found in the case of a priori equipartition F($) = x t and I proved in 1916 1 
for any derivable F(x), that P n (z) tendh to a normal distribution: 



with tt * H n (x - An) 


(4) 



1' @ tf(l - a) 

Ml ” 


It is easily seem from (3) and (4) that 

(6) lim Pn{x) - 

[i 


if x < a 

'* X > at. 


Let ua now consider a slightly modified form of the problem. 9 


Instead of one 


1 Mathmali&che Zeitechrifa vol. 4 (1019) p. 0)2. Of. my textbook WahmhrtAlUhMlt' 
rechnung und ihra Anwndvngm t Wien-Leipsig 1931, p. 158. Later I proved the L&plaoe- 
Bayea theorem for a moro general close of ^(x): Mcnakhefte far Mathmotih und Phyeik t 
vol. 43 [M) pp. 105-128, 

1 This modified problem has been treated by S, Bodmer, Annafs 0 / Hai/t., Yol, 3T, 1036, 



MODIFICATION OF BAYES* PROBLEM 


257 


urn we suppose there are given n urns each containing white and black balls. 
The probability for drawing a black ball from the urn is unknown, but is 
subject to an a priori probability function F($) which furnishes the a priori 
probability for the relation p f & x, independently of v, We assume that on 
drawing one ball from every urn a black ball appears m times and a white ball- 
n - m times. Putting 

(6) n +a ±:: -+ft . ft 


we ask for the a posteriori probability P„{x) for the relation p <; x. 
The Bayes formula (1) must now be replaced by 


(7) 


P*(®) - Kf J j • • • j pip% •. * p m ( 1 - p M+l )(l - p „+*) 

i 

(l-pOdFfa) dP(p») 




where K! is a constant determined by (1'). It iB very easy to examine the 
asymptotic character of (7). We shall prove the following 
Theorem : If the first three moments of the a priori distribution F(x) 


(8) b r =C</dP[»), 1,2,3 

I 

exist and if the dispersion ba — bj is different from 0, the a posteriori probability 
P n (x) tends for n -> « under the condition (2) to the normal distribution (3) with 


w 


4 *"% +( i“ o) r^ 

1 If bib a -6|. n v (b» - b*)(l - bj) - (bi -bi) 1 

wr»l a sr +{1 ~ a) -- 


In order to prove the theorem we write 


^ V,(p,) 

( 10 ) 


i J zdF(z), if v = 1,2, * • • i» 

Cl Jo 

/(l — z)dF(z), if v =m+ l,m + 2, •••«. 
1 - oi Jo 


Then formula (7) becomes 

(1 J) P,(x) - C J j j iV.(p«)'• • • iy,<]>.)■ 

pi+yi+' * •PiS 11 * 

Each F„(p P ) is a distribution function, i.e. a non-decreasing function with 
y,(_ to) = o, Y,(c) = 1. Therefore the constant C in (11) is equal to 1 and 



258 


It. V. MIBES 


the integral represents the distribution function for the arithmetical mean 
(pi + + • • • Vn)/n. According to the Central Limit Theorem of the theory 

of probability P n (x) will converge towards a normal distribution when certain 
conditions are satisfied. In every'case, if a v , aj denote the mean value and the 
dispersion associated with V v (x) t then the mean value A* and the dispersion 
A associated with P*(s) will be defined by 

a®) a.= life, fij-iirf. 

71K-1 ft r-1 


We find from (10) 



(14) 


o, = [ xdV,(») - i [ x'dF(x) » if , - 1,2, • • • m ’ 
= [ a(l — x)iF = y— r, itr = m+ !,■■■» 

1 — 4>l Jo 1 — Ol ■ 

(*)-«?-=^-2, if» = 1,2, • • • #i 


_ ba — be_(f>i — bf 

” 1-hi ~ JT-btf’ 


if v = m + 1, • • * 


We supposed the dispersion of F{x) to be different from zero. It follows that 
(15) h * 0,1 - b 9* 0, bb - b{ ft 0, (6a - b,)(l - bi) - (bi - bf * 0. 


For b = 0 would imply that dF{x) - 0 for all x > 0 and 1 - bi = 0 that dF(x) 

- 0 for all x < 1; in both oases the dispersion would be zero. On the other 
hand, it is easily seen that the relation W>i - bi * 0 is not compatible with the 
condition, of a non-vanishing a priori dispersion and that the same is true for 
the relation (ha — b*)(l — bi) — (bi - b#) 8 = 0. 

The total dispersion SsJ is equal to the sum of m times the value (bsbi - bS)/6j 
and n - m timeB the value l(ba - ba)(l - bi) - (bi - ba) a ]/(l - bi) s . 

Thus we see tfiat under the condition (2) the sum tends to *>, while the 
ratio sJ/2*J tends to zero, if n increases infinitely. These are sufficient conditions 
for the validity of the Central Limit Theorem.’ The values given for A n anjU*> 
in (fl) follow from (12), (13), (14) and the well known relation 2/?’Si — jr 

8. Bochner in his previously quoted paper Found, in a more complicated man¬ 
ner, the value of A n and only showed that P„(x) tends to zero if ® < A* and to 
1 if x > A*. 

Examples. If we assume the a priori probability to be uniform, i.e. F(x) = x, 
we have 

bi ■= ba = J, b* =3 i 

and therefore from (9) 

_ _i—_r 

" " f 

* Cf. H. Cramer, Random Variable* and Probability Distributions, Cambridge Tract in 
Mathematics and Mathematical Physios, No. 30,1037, p. 60. 



modification of bates* pboblem 


269 


A,, - i(« +1), 


1 «-L 

18n‘ 


A more general ease is that of a more concentrated a priori probability fuuo- 
tion 


r« = C**{1 - *)', C - O j j+P - 1 . 

Hero we find 

* + l fc._ (fc + l){fc 2) 

H/+2' (TTr+w+r+y)* 

i __ (fc + l)(fc H~ j)(fe + 

(fc + l + 2)(fc + I + 3 )(jt +1 + 4) 

and the values of A n and H\ are 


. A fc 1 1 ct(I -* fc) *!■ (fc 4" 00 *4* 2) 

" * h + l + 8’ 2H 1 , ~ n{k + l + iy(k + l + 4) ' 

By introducing the momenta of F(z) relative to the mean value, i.e. 
B> = [ to-bd'dF~b,-bl, 

W> . * p 

B, = / (* - bO'dF - 6, - 3h6. + »! 

are can tranaform the general formulas ( 6 ) into 

Bt 


(17) 


A " ’ h + ( “ ' w 

1 1 fo , D « “ D J b{ + «(1 “ 26|)T 

ST sL B * + **31^5 " B * riu-V I 


The first of these equations shows that the a posteriori mean value An (for 
all n) is equal to the a priori mean value bi , if the experimental mean m/a or a 
coincides with the latter. On the other hand, in the case of a symmetric* a priori 
distribution (61 — St = 0 ) the second equation is reduced to 

JLj 

On the whole it is remarkable that the influence of the a priori probability does 
not vanish for n —» <#, in the case of our modified Bayes problem , 4 The ex¬ 
planation of this fact is to be found in a more generalized theory of the inverse 
problems in probability. 


UniYBBSITT OF laTANBUlt, TuMEBT. 


4 Of. in/ papers quoted in footnote 1. 


ON THE PROBABILITY THEORY OP ARBITRARILY LINKED EVENTS 

By Hilda Gbimwqeh 

1. Introduction. ThB classical Poisson problem can be stated as follows: 
Lot pi , Pi , * p* be the probabilities of n independent events Ei , Et , E n 

respectively; i.e. the probability of the simultaneous occurrence of B t and 
is equal to pi pj , that of Et,E}, Ej, is, equal to pip/p* and so on. We seek the 
probability P*(a) that x of the events shall occur. If, » p % « ... <= p n 
the problem 1 b known os the Bernoulli problem. 

More generally the n events may be regarded as dependent, Let pt f be the 
probability of the simultaneous occurrence of Ei and Ej\ptfb that of E h E fi E* 
and finally pu.,. a that of ft, ft, ■.. ft, There shall arise again the problem 
of determining the probability P B fct) that % of the n. events will take place. 1 
Furthermore the asymptotio behaviour of P n (x) for large n cau be studied; and 
we shall especially be interested in the problem of the convergence of P n (x) 
towards a normal distribution or a Poisson distribution. 

Even in the general case which we just explained, the sums 

» n 

Si “ £ Pi, 2 PUi * " ** Pit...* 

i'l fj-l 

of our probabilities differ only by constant factors from the factorial momenta 

M"\ M?, .. ■ M[ R) of P n (x), For we have 

fi» = ^ M? J = J| £. ” 1) • • * (x — ? + l)Ptt(ss). 

Starting from this remark the author has, in earlier papers, [8 t 9,10] established 
a theory of the asymptotio behaviour of ?„(#), making use of the theory of 
moments, The criterion for the convergence of P n (x) towards the normal—or 
the Poisaon—distribution consists of certain conditions 9 which the ft must' 
satisfy. 

In the following section a concise statement, of the whole problem will be 
given, independently of the author’s earlier publications, For the convergence 
towards the normal distribution we shall be able to establish a theorem under 
wider conditions in a manner which seems to be simpler, Finally, some appli¬ 
cations of the theory will be considered, 

1 Sac, for instance, references [1]-[7J at end of papor, 

1 Using the "theorem of the continuity of moments," Professor v, Misos [11] established 
sufficient conditions for the convergence of P a (*) towards a Poisson distribution in the 
case of the problem of “iterations.” However, his reasoning can be applied to the general 
ease without much difficulty. 


260 



PROBABILITY THEORY OP ARBITRARILY LINKED EVENTS 


261 


2, Formulation of the problem, Lot us consider the n-dimensLonal collective 
(Kollectiv) consisting of a sequence of any n trials. In the simplest case these 
trials will be aUermlim, i.e, for every trial there will exist only two results, 
which we may denote by "occurrence,” “non-occurrence” or by "1,” “0.” The 
Bingle trial may eventually be composed in various manners, For instance 
we may draw m > n timeB from an urn, which contains counters, bearing in 
arbitrary proportions numbers from 0 to 9. The first “cVent" Ei may consist 
of the fact that the first three extracted counters bear even numbers; the second 
trial Ei will be regarded as successful, if the sum of the counters extracted at 
the second, third and fourth drawings is greater than fivo, etc. In every case 
the result of the n trials will be expressed by n numbers, each of them equal to 
0 or 1. The result (1,1, 0, 0, 0, • - • 1), for instance, means that the first, the 
second, and the last trial were successful, the third, fourth, ■ * • unsuccessful, 
and we have an arithmetical probability distribution v(sci ,»*,•*•»,) (a* - 0,1; 
k => 1,2, • * ■ n), where 

(1) S ' * ' Zrf «(3|, £»,*■• Xn) = 1. 

*i ** 

- Instead of the 2" - 1 values of v wo will deal with certain groups of partial 
sums of them; the first is 

E * ” E »(**, a*, • * * ®i_,, 1, x i¥l • •« x n ) « pi (i » 1,2, ■ • ■ n) 

where p* is the probability that the i-th trial will be successful. In an analogous 
manner let pq be the probability that the i-th and the j-th trial are both sucess- 
ful, pi & the probability that the i-th, j-th and Jc-th trials are simultaneously 
.successful, Let us provisionally denote by an (n — 1)-tuple sum over all 
variables, except xt , by an {» - 2)4uple sum over all variables except x\ 
and etc. We Bholl then have: 

pi a | ‘ 1 I 1| iTi+1 , ' * 1 ®«) 

pit “ E 1 • 1 i 1| 3f+l i ' * ' X I -1 i Ij »•■■*») 

( 2 ) 

Pu...» =* V{1| lj ■ • • 1)« 

In the following these probabilities p*, pq , Pak • • • will be assumed as directly 
given - There are 



values of this kind and it is easily seen, that tho partial sumH (2) are linearly 
independent , 

If, especially, the probability «(ii, xi , ... x n ) depends only on the number 
of zeros amongst xi, Xj, •«• x *, i.e. if 




262 


HILDA QEIRINGBR 


e(l, 0, • • • 0) ** s(0, 1, 0, ■ • • 0) — • • • = v(0, 0, • ■ * 1) 
v(l, 1, • ■ * 0) =« v(l, 0,1, • • • 0) = • * ■ = v(0, 0, ■ * • 0,1,1) 

the value of pi is independent of i, the value of Pa independent of i and j, 
and so on: 


Pj «o pj <**••» pn 


Pm = Pa ■* ■ ’ ■ — pn-i,n 


In the particular case of independent events we have only to deal with n 
probabilities, namely pi, p B , • - • p n . We have indeed pi/ = p<p/; ** 

V<P{Ph - • • p«.»* * PiPt • * ■ Pn ♦ 

In the oase of chains however, we need only know (2n - 1) values, namely 
Pi, pi i " * P»; pn j Pa i ■ • ■ Pn-u • The other pa, and the p*/*, • * • pi B ..„ 
can be expressed in terms of the above probabilities. 

Returning now to the general case it is easily seen that in the expression for 
P n (x) the Pi, pn will appear only in the following combinations 


(3) 5,(0) * 1, S*{1) 


■ H 

£ pi, 


s„{i> 


1 5o n 

S"' 


Sn{n) « Pa 


• 9 , fl* 


Indeed, at the basis of the solution of the “problem of sums/ 1 there are the 
following relations [11] between the S n (t) and the FnC®). 




x - 0, • • • n\ 
jt *» 0, • * • n) 


The linear equations (4) may be solved (by reoorrenoe) for the P n (x) and we 
find the important result that 


(» P.b) = t (-W*W) 

l"l \®/ 

Let be the s-th factorial moment of P n (z), i.e. 

(0) m 2»0b*-1)“'(p-« + 1)F*C®). 


Making use of (4) and (6) we obtain 

(7) M[" - 015,00. 

Our aim is to obtain information concerning the asymptotic behaviour of P n (x) 
by studying that of the moments of The moments however are easily 

seen to be given in termB of the 5,(0). 


3. The asymptotic behavior of P n (x). Convergence towards the normal dis¬ 
tribution. 

a. Tan Principal Theorem. According as the mean value 




PROBABILITY THEORY OP ARBITRARILY LINKED EVENTS 


203 


(«) M ?’ = S,(l) = «. = ± 

r-*l 

remains bounded or not for indefinitely increasing n, there are two types of 
passage to a limit. In the first case the distribution will converge (under certain 
. conditions) towards a Poisson distribution; in the second case it will approach 
(under certain conditions) a normal distribution, As regards the convergence 
towards, the Poisson distribution the author has published |9) a sufficient 
condition, whioh seems to be quite simple and general. Wo shall, however, not 
resume this problem in the present paper. 

We propose, indeed, to prove in the following pages a new theorem concerning 
the convergence of 

7n<l) = E Pn(0 

towards a normal distribution. 

For this purpose we introduce the following function of the discontinuous 
variable 2 » 0,1, 2, . • • n 


(9) 


oM> 


z± 1 &{*+!) 

a* 


2 1 1 iS 

or, more concisely written g, - - ^ t where (S*( 2 ) is defined by (3). Put- 

CL jj| 

ting z = a n u } let us consider 


(10) = K(v) 

where u is regarded as a continuous variable in the interval from 0 to c. (« > 0.) 
Denoting the variance of x for 7 fl (y) by Af* » 8» we shall prove the 
Theorem : Let the function h n (u), defined by (10) satisfy the following conditions : 
(i) 7/ n is sufficiently large, K(u) admits derivatives of every order in the interval 
(0, «) 

(11) At w ® 0, the first derivative of A B (w) has a limit, forn-> which is different 
from —1. 

(iii) If u is in the interval (0, e) the k-ih derivative of h„(u) remains, for every k, 
irfferior to a bound Nb which is independent of n. 

Then 

(11) lim 7n(«* + vW%) = 4= f 

We shall Bee that 1 in many applications these conditions may reasonably be 
assumed as satisfied, 
b. Demonstration of the Theorem. 

In order to prove the prinolpal theorem, stated above, we shall at fiist deduce 
some properties of the (finite) differences of Q*(e) (2 = 0,1, ■ • *) from the assump¬ 
tions (i), (ii), (iii) which deal with the derivatives of h n {u) . Indeed, the fc-th 



264 


HILDA GEIHINGEH 


difference of/ g n (z) with respect to s, (which contains the values of g n (z) for 
3 = Q, 1, - * • k), differs only by the faotor a* from the *-th divided difference of 
hn{i*)i with respect to u (which is formed by the values of h„(u) for u = Q, 

— Let n > k and so large that «/a n < then all upvalues used in 

On Cln 

the formation of the divided difference of h»(u) will be iu the interval (0, e). 
Now, as it is well known, the absolute value of any divided difference of order« 
can not be larger than the largest derivative in an interval which contains all 
the abscissae, used in the formation of the divided difference. But according 
to hypothesis (iii) the *-th derivatives of A„(w) in (0, «) are all inferior to N t , 
Therefore* we have 

(12) | at| < N< . 
and f of every y > 0 

(13) lim at"* A* g n {t) = 0. 

«-»• |—4 


On the other hand from condition (ii) it follows, os is easily seen, that 
(14) lim «„ A Q n (?) *= On fen(l) - 0„(O)] =» c f 6 

n^oO xu|) 

The equations (13) and (14) imply but finite differences of 
Let us now introduce certain new moments F v which we could call "factorial 
moments about the mean." They are indeed related to the factorial momenta 
U {v) in exactly the same way as the moments M„ about the mean are related 
to the momenta U\ about the origin. Writing, S,, a and g, instead of S n {z), 
a*and^(«),Veset 


( 16 ) 


F,-V yf w (0 - + (j) Af"~V-±o’ 

■= wJS, - t IS M a + (*) (» — 2) I S.-id 1 —•via' • 


*4 *» 1, , f1 * 0. 


where, particularly, 

(16) 

From (16) we haver 

■ i 

“ + c^ia + (j) Fr-tf? + ■" + (, ^ g) 

Let us begin by proving the following 


(IV) 


i f i<r* + q 


’ If we only want to daduoe (13) it is sufficient to suppose that N k (without being inde¬ 
pendent of n) increases more Blowly than any power of cu. 



PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 


265 


Lemma I; It follows from (13) and (14) that toe have for the F, defined by (15) 

08) lira 4 - Q, - “ V*. , ' 

»-** fl* r \1*3 ■ • • (v — 1 ) 0 *' if v even. 

First we conclude from (15) and (14) that (18) is true for v *= 1 and v = % 
In order to prove (18) for every v, we shall point out, that 

(v * 2,3,. • • 0 


a* 


(19) 


Setting 


(20) 

/. « g» - 1 and m, = ~ » 

a* 

we get 


(21) 

tr» “ ' 

fll| 

and 


(22) 

Am, = m,/, 

A'm, = i H WJ 

But according to 

(15) we have 

(28) 

A" m, - i Ft 

i-o a f 


{e m 0,1,2, • • •) 


and therefore 

(24) ^ “ *£* WJ - «*'= E/*£ 

(« + T 5 S i" — I) “ S " — I* 0 S f — 1). 

Here we have made use of the faot that the x-th difference of a product uv can 
be transformed in a finite Bum XS^ a uA% where a and fi are non-negative integers 

and a ^ k, 0 $ x. ^If we concern ourselves with derivatives and not with finite 
differences, we havB, a + 0 =* k and Sa& = . Suppose 

a + fi > v — 1. 

Then 0 £ p — g; therefore,, as v > a we have fi > ■ Since A*/, = £?g, 

the product a^ r "* , Afy conuerpea tomrd aero, in accordance with (13), whereas the 

r*"0 

F 

factor Safi -jj remains bounded for every a < v. Now suppose 



HILDA GEIRINOER 


a + p « v — 1. 


Then p s* v — 1 ~ a. First Jet a < v — 2; then j3 = j/— 1 - a > — 

z 

Thus a^ a) i^f t converges again towards zero, whereas the other factors are 
bounded as before, Next, if a * p — 1, then £ = 0 and A 0 /* =* / 0 =* 0. Thus 

a-0 

the corresponding terra, of our sum is equal to zero. Finally if a * v - 2, 
then (3 = 1, and S a p = v — 1. The corresponding term of the sum (24) will be 


O' - 1 ) lim a A /. - O’ - l) c I™ J^r, 

Qr^ ’ nwoo a—0 n~*oo fl- 5 ' 


which completes the proof of Lemma I. 

We shall now establish a relation between the factorial moments about the 
mean F ¥ and the ordinary moments about the mean M v . To an expression 
of the form 

(26) ca p F« 


(where the constant c is independent of n) let us attribute a “weight” p + 

2 

Then we shall prove the following lemma 
Lemma II: Lei v « 2p(v even), v = 2pi + 1 (? odd) and 


(26) 

v\ 

*" ~ (v - 2p)\ jFpi" 

Then 


(27) 

M, — i p 

p-o 


is equal to a finite sum of terms of the form (26), each of which has a weight less 
than vj% 

To prove this lemma we begin by expressing the M , in terms of the factorial 
moments M w . We shall then express the M {p) by the F ,. Now, let be 
the “Stirling numbers of second kind,” i.e., putting 

(28) =* x(x - 1) - • • (a? — z + 1) 
we have 

(29) a*- is,,^ (2 = 0,1,2, • ■ •) 

* a-0 

Then by an elementary calculation we obtain 

(30) . Jf, * t M™ [ Sp , - ± (’)«']. 

If we now introduce the F K we get 



PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 


267 


(31) 




V - p - 

r 


(32) 


Furthermore we may easily verify that 


- 2 2 ) V ’- 1 ~ ± ( r ) s " M ]' 


•iC 


,<»> 


+ 


i 


~ r — p) 1 


(p-r-p) 


But the for z = 0,1, 2, *.. are equal to the values of a polynomial in z, of 
degree 2«, the highest term of which is equal to . The degree of the product 


(33) 


( V T-X 


<p(x) 


is therefore equal to (v — r - p) + 2p = y - r + p. On the other hand the 
expression between brackets in the right hand member of (31) is nothing other 
than the v-th difference of f (s). (The missing terms of this difference are indeed 
equal to zero, the corresponding being equal to zero,) 

This y-th difference will certainly vanish if 

v - r + p < v he, r > p. 

Now, let r - p. Then the v-th difference, i.e, the coefficient a p of F p ^ fi a r 
= in (31), is equal to v\ multiplied by the coefficient of in <p(x): , 


a p = v\ 


1 


O' — 2p)l 2^pi 


Finally, let r < p. Then the weight of ft, f _,o r is inferior to p/2. We have thus 
established Lemma II. 

We have for instance for v «* 1, 2, 3, 4, 5 

Afi = ft = 0, Aft — ft -f- o, Aft = F$ + 3ft -4- a 

Aft - (ft + 6 aft + 3 a 2 ) + 6 ft + (7ft + a) 

Aft = (ft + 10 fta) + (10 ft + 40 fta + 10 a s ) + 25 ft + (15 ft 4* a) 

Inversely in an analogous manner, we can express ft by the 

M P (p = 1 , 2 , • • ■ v). 

We can now terminate our demonstration by proving the following 
Lemma III: If the conditions (18) are satisfied^ then 



268 


HILPA GEIRINGER 


(34) 


M, „ 

im OF' ~ = 

»-»* Mi 


0 

i-a 


First the equation (18) for v - 2 gives 


lim - = lim 

fl —HD fl fl ~* cfl 


■ i-odd 

•■•(» — 1) ••• veven, 


M% - a 

a 


thus 


(35) lim — = 1 + c (c ^ -1). 

n-*eo fl¬ 
it is therefore obviously sufficient to prove the relation 

(36) lim ~ = fT.(l + «)*”• 

n-Kso (P v 


Putting v = 2ju and v = 2m + 1 respectively we obtain however from our 
lemma 

a? v pm a 


Here R represents a finite sum of terms of the form (25), of "weight” inferior 

to But by virtue of (18) such a term, divided by a converges towards zero 
2 

and we obtain 



lim tJ = £ «p Km 3— 

n-*« 0^ p«*0 »**♦« ^ 


F,~h 


f'_ >» r 

pto (k — 2p)!2 p pl '■ s '’ 


For an odd v, G,~i e is equal to mo; for an even v{= 2m, say) however, we have 


and we obtain 



(2m ~2p)! 
^(m- p)i 


(38) 


Mi,f (2m)! (2m — 2p) 1 

ft-*M ~aF ~ fa (2m - 2p)12 f p!'2^-o(M - p)i 


- <M! f ML 

" 2V ^mI(m-p)! 


M + cV 


in accordance with (36), Lemma III is therefore proved. 

Our* principal theorem is now an obvious consequence of the well known 
theorem of the continuity of moments. By virtue of this theorem the con¬ 
vergence of V n (a n + ysnVfy towards a normal distribution as given by (7) 
will indeed be assured if the moments of V n converge towards the moments of 
the corresponding normal distribution; .i.e. if (34) is true. Thus our principal 
theorem is completely demonstrated. i 



PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 


26d 


4. Some applications. 

Example 1. We shall consider the following play as a very simple appli¬ 
cation of our theorem: An urn contains m — 2n counters bearing the numbers 
1, 2, • • • m. We draw them all, one after the other, without returning the 
counters previously drawn. We as k for the probability P 2n (z) that an even counter 
will appear at a drawing of even number x times (0 ^ x ^ n). 

As can be easily found, we have 


Consequently 


Th = Pi = • * • = p2r» = i 


p2,4 = Pifi ==•'•= P2n-2,2n 


1 2n-2 
4 2n~ 1 


c _ n 0 _ (n\ 1 2n - 2 o _ f n \ 1 ~~ 2)(2n - 4) 

1== 2’ * s * \2/42n-l’ ^ “ \3/ 8 (2n — 1)(2« — 2)' 

(39) •••• 

fl _ 1 /»\ (2n - 2) (2n - 4) > • • (2 k - 2z + 2) 

‘ 2’ W (2n - l)(2n - 2) • • • (2ft - 2 + 1) • 

From (39) it follows that 


(40) 


Qn{l) = 


n — z 2n — 2z 
n 2 n — 2 * 


Setting zf \n — u, we get 
(41) 


hn(ft) — 


(2 - u? 

<-$ 


The conditions (i), (ii), (iii) of our principal theorem are obviously satisfied if 
c < 4 and we have 


*1(0) - - c 


The probability defined above is thus seen to converge [according to (11)) towards a 

normal distribution , having a mean equal to ^ and a variance Mi ~ ^ 

l o 


Example 2. Probability of an “occupation” Let h stones be distributed 
by chance over n places.. Then the probability that any stone will occupy a 
certain place will be equal to l/n. We ash for the p'obability P fl ( x) that th&re 
shall be x places, every one of which is occupied by exactly m stones i 
By certain simple considerations, well known in combinatory calculus, 
we obtain: 


(42) 


k\ 

*** "* n m\(k - m)\ 



4 The problem presents itself for instance if we ask for the probability that in a certain 
county there will be x villages, everyone of m inhabitants. 



270 


HILDA GEIBINGER 


(43) 


s, — 


n\ 


k\ 


m-r- 


z\{n - z )I - wz)V 
Let = a. From (43) we deduce that 

, (.-a-(.-ar 


Qn{z) - 


n~ 2t 

n 


(44) 


l-~ , 
n / 


nr t-4-r 

*(*-s)-(*- s r ! ) 


m + m — 


-o 


Now, let n and h tend simultaneovdy to «>, in sucfc a way that a = - remains 

n 

fioimded. We get at first 

(45) lim - = — e~°. 

ti*4oo w ml 

As a n is seen to be of the order of magnitude of n we introduce the new variables 

z , a* 

-aD and v - u - . 
n n 

We have then (writing h and % instead of h n and Tin) : 

gn(*) ** g«(nv) « TL(v) 

m = n(u^) = h(u). 

Therefore 




(46) 


" a ~ ,) ( lf «&-■>) 

These formulae show that the fc-th derivative of h(v) with respect to v contains 
only rational expressions, [in the denominators of which there appear powers of 

(1 - »)], and positive powers of log The conditions (i) and (iii) 

of our principal theorem are therefore satisfied if e < 1. Furthermore we have 



PROBABILITY THEORY OF ARBITRARILY LINKED EVENTS 


271 



-1 - 


a 





n 


(47). 


—m 


(—sK— 2 i J )+*(*“ D 


44 )-(-^) 


and consequently 


lim 

n-*o& 




- a + 2m 

a 


lim — 

n-*co Ifi 


(m - aY \ cT „ 

« /ml ~ 


ot-2 \ 
» / 


We have thus obtained the interesting result that, 

The probability Vn{x) that x places at most are L occupied, each one by m stones, 
converges towards a normal distribution if k and n tend simultaneously to *> in 
k 

such a way that lim - = a is bounded. We have then 
(48) lim y„(o„ + u\J\ «,) = <j>{u) 


with 

(49) 


lim — 

»<o n 


a m - 
" m! 6 


lim — = 1 - 

n-*« a n 





University op Istanbttl, 

Istanbul, Turkey. 

REFERENCES 

[1] H. Poincar^, Galcul des Probability, Paris, 1912 

[2] W. Burnside, Theory of Probability, Cambridge, 1928 

[3] G. XJ. Yule, Introduction to the Theory of Statistics , London, 1932 

[4] C, Jordan, Acta Litter f Scient. (Sz6g4d), Yol. Ill (1927), p. 193 

[5] C. Jordan, Acta Litter. Scient (Sz6g6d), Vol. VII (1934), p. 103 

[6] E. J. Gumbbl, <7omjj£es Rendu a, Vol. 202, (1938), p. 1627 

[7] E. J. Gumbbl, Giorn. Inst. Ital. Alt , Vol XVI (1938) 

[8] H. Geiringer, Comptes Rendus , Vol. 204 (1937), p. 1856 
[0] H. Geiringer, Comptes Rendus , Vol 204 (1937), p. 1914 

[10] H. Geiringer, Remo Interbakonique, (Athens), Vol II, pp. 1-26 

[11] R. vonMises, Zb. Aug. Math, und Mech., Vol. I (1921) 



FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE* 

By g. S. Wilks 


1, Introduction, The essential idea involved in the method of argument 
now known as fiducial argument, at least in a very special case, seems to have 
been introduced into statistical literature by E. B. Wilson [1] in connection with 
the problem of inferring, from an observed relative frequency in a large sample, 
the true proportion or probability p associated with a given attribute. Since 
1930 the ideas and terminology surrounding the fiducial method have been 
developed by R. A. Fisher [2, 3], J. Neyman [4, 5] and others into a system 
for making inferences from a sample of observations about the values of param¬ 
eters which characterize the distribution of the hypothetical population 
from which the sample is assumed to have been drawn, The functional form 
of the population distribution law is assumed to be known The parame¬ 
ters may be means, a difference between means, variances, ranges, regression 
coefficients, probabilities or any other descriptive indices or combinations of 
indices which may be considered important in specifying the distribution 
■function of a population. In arguing fiducially about the value of a parameter, 
a procedure applicable to some of the simple cases begins by the calculation 
from the sample of an estimate of the parameter in question. The values of 
the estimate in repeated samples of the same size will theoretically cluster 
"near" the true value of the parameter according to a certain distribution law 
which can, in general, be deduced from the functional form of the population 
distribution law. If the distribution of the estimate involves only the one 
parameter, and if, as is frequently the case, one can find a function ^ of the 
estimate and the parameter which has a distribution not depending on the 
parameter, then one is able to set up, in a rather simple manner, fiducial limits 
or a confidence interval for the parameter corresponding to the observed value 
of the estimate. The limits will depend on the particular method of calculating 
the estimate, the value of the estimate in the sample, and on the degree of risk 
of being wrong which one is willing to take in stating that the limits will include 
between them the value of the parameter for the population under consideration. 
In general the smaller the degree of risk, the wider apart will be the limits. 
Thus for a given pair of limits there will be an associated degree of uncertainty 
that the true value of the parameter is actually included between those limits. 
This uncertainty can be expressed by a probability a calculated from the 
sampling distribution of the ^ function of the parameter and estimate. Under 
certain conditions, one can, by simply changing variables, obtain from the ^ 

* An expository paper presented to the American Statistical Association on December 
28,1937, at the invitation of the ProRram Committee. 

272 



FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE 


273 


distribution what has been termed by Fisher a fiducial distribution function of 
the parameter. From the fiducial distribution and for a given value of the 
estimate one can actually determine fiducial limits of the parameter corre¬ 
sponding to a given risk a. It will be seen as we proceed that the fiducial 
distribution plays no indispensable part in fiducial inference; the ^ function 
and its distribution from which the fiducial distribution is derivable, are suffi¬ 
cient for the fiducial argument in many cases that commonly arise in statistics. 
We shall discuss fiducial argument and fiducial distributions from the point of 
view of functions. 

2. Example. To illustrate these points let us consider an example, namely, 
the problem of determining fiducial limits and the fiducial distribution of the 
range of a rectangular distribution for a given value of the range in a sample 
“randomly drawn” from it. 

If a sample of n individuals is drawn from a population whose distribution 
law is f{$, 0) = 1/0, where only values of x between 0 and 0 are considered, 
(that is, a rectangular distribution having range 0) the* probability that the 
range r of the sample lies between r and r + dr is <p(r, 0) dr ) where 

(1) ?M) = - r)r"- 2 . 

Here 0 is the parameter under question, and r is the estimate; r is the difference 
between the largest and smallest variate in the sample. Thus, for a given 
value of 0, say 0o, <p(r, 0 O ) is a sampling distribution law defined for given 
values of r on the range r = 0, to r = 0o • If we let r/B = then 

(2) <f(r, 6) dr - n(n - 1)(1 - W* # = <?(*) #, 

which, from a statistical point of view, shows that if we should take an aggre¬ 
gate of randomly drawn samples (of n items each) from rectangular populations 
and calculate ^ for each sample-population combination , then the distribution 
of ^ will be that given in (2). By a sample-population combination in this 
example we mean any rectangular population that may arise and a “randomly 
drawn” sample from it. The possible values of ^ range from 0 to 1. Thus if 
is such that 

(3) n(« - 1) j*° (1 - M" - *# = Le. - (n - ltyj = a, 

and if we draw a sample of n from a rectangular population, we can claim that 
the probability is 1 - a that the <p produced by this sample-population com¬ 
bination will satisfy the inequality 

(4) *«<*<!. 

It should be observed that there are many pairs of numbers, say and 
such that we can claim that < i' < with probability 1 — a of being 



274 


8, 8. WILKS 


correct in making the claim. and are ordinarily , chosen so that the 
interval formed by them is as short as possible (or approximately so) in some 
sense. Inequality (4) is equivalent to each of the following inequalities 

(5) K < l < 1, ~ > 6 > r. 

Now can be determined from (3) when n and a are given. For example, 
if a « .01 and n « 10, we find from (3) that = .495. For a given sample, 
the fiducial limits r/^a and r can be calculated from and the sample. It 
will be noticed that fiducial limits are nothing more nor less than random 
variables that fluctuate from sample to sample. The interval between r and* 
r/V'a is called a confidence interval or fiducial interval; 1 — a is known as the 
confidence coefficient [4] associated with the limits. Hence, in repeated samples 
of n from a rectangular population with range ft, 100(1 — a) percent of the 
samples will produce fiducial limits r/ty* and r which include the fixed value ft, 
between them. This statement holds regardless of the value of ft. Hence 
in an aggregate of sample-population combinations, the aggregate of pairs of 
fiducial limits r/$ a and r will, in 100(1 — a) percent of the combinations, in¬ 
clude between them the true value of the range of the population. Further¬ 
more, whether there is only one rectangular population for all sample-popula¬ 
tion combinations or many different rectangular populations, this statement 
remains true, thus showing that the method of fiducial limits for inferring the 
value of the parameter is independent of any a priori distribution of rectangular 
populations in an aggregate of sample-population combinations—the distribu¬ 
tion being with respect to values of ft 

Let us look at the matter geometrically. Suppose we are drawing samples 
from a rectangular population with 8 = ft . The r for each sample is repre¬ 
sented by a dot along Or in Figure 1; corresponding to each dot there is confi¬ 
dence interval cutting across the 7-shaped region MOR . The probability is 
1 - a that a confidence interval computed from a sample from the population 
having range ft will cut the line ft K. The cutting of ftff by a confidence 
interval is equivalent to the statement that ft is included between the corre¬ 
sponding fiducial limits. 

From a practical statistical point of view what we have said has the following 
meaning: If on each occasion in which a randomly drawn sample of n from 
some rectangular population is considered, one (i) calculates the numbers rfty u 
and r, and (ii) asserts that the range in the population producing the sample 
lies between these two computed limits, then in about 100(1 — a) percent of 
, the cases assertion (ii) will be correct (theoretically). Thus, in dealing with 
samples of 10 individuals from rectangular populations, one would be correct 
(theoretically) in about 99 percent of the cases by asserting that the population 

times the sample 

of n all the way 


range will lie between the sample range and 2.020 
range. More generally, one need not use the same value 



FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE 


275 


through, provided that for the given a one evaluates according to (3), for 
each n that arises. It will be seen from (3) that as n increases, the value 
of & a tends to 1 and hence the fiducial limits rf$* and r for any given sample 
tend to the same value, namely the sample range, thus showing that fiducial 
inferences about 6 can be made arbitrarily certain by taking sufficiently large 
samples. 

It is evident that the method of fiducial limits furnishes a satisfactory pro¬ 
cedure for inferring the value of the population range 6 from samples drawn 
from rectangular populations, Let us now go a step further and consider the 
fiducial distribution of 0 and how it fits into the scene. The cumulative distri¬ 
bution of if/ is 

(6) [ft — (r» — 1)0 



.(7) F(r, W - (j)" ' [»-(»-1) (0] 

which increases from 0 to 1 as t increases from 0 to So. Geometrically, z = 
F(r, 0) can be represented as a surface defined over the region bounded by lines 
00 and OR in Figure 1, such that z is zero along 0 and is unity along the line 
OR (r = 0). F(r, 0) is continuous inside the region BOR, and for any given 
value To ?* 0 of r, F(r, 8) decreases from 1 to 0 as 0 increases from r 0 to ». 
The curves having the equations 




276 


8. S. WILKS 


(where 0 O , 7*0 , and a are such that r o /0o - and F(r 0 , 0 tt ) = a) are the 
curves C and D respectively. C is the cumulative distribution of ranges of 
samples of n from a rectangular population with range 0 O . The curve D has 
the mathematical characteristics of a cumulative distribution function cumu¬ 
lated in the negative direction with respect to 0: its ordinates increase from 

d 

0 to 1 as 0 decreases from «© to 0 O . Thus, if we take — — F(r 6 , 0) we get a 

do 

function g(B } r 0 ) which has the essential mathematical characteristics of a 
distribution function; it is non-negative; can be integrated over any interval 
of 0, and has total area under it equal to unity. We have 

(8) 9 ( 6 , ro) = n(» - 1) jr (l - fj 

and it is called the fiducial distribution of 0 for r = r 0 , It must be firmly 
pointed out that 0 is not a random variable and hence g(6 t ro) is not a distribu¬ 
tion function of a random variable, although it has the mathematical properties 
of such a distribution. Objections have been raised to the use of the term 
fiducial distribution on the grounds that the thing to which it applies is not a 
distribution at all. However, as long as the term is carefully defined there 
should be no ambiguity in using it. From an analytical point of view, the 
problem of obtaining the fiducial distribution of 0 is only a matter of changing 
variables for since 

(9) , <p{r } Q) dr = g(0 f r) dd = n(n - 1)(1 -’^ n " 8 

and =* n/yf/a , we have 

r*t i • rnih n 

(10) / 8 0 ) dr = g{9, n) dd = / n(n - 1)(1 - ^ ^ • 1 - «. 

JUtm Jn Jta 


We remark again that 

(11) / g(0, rc) dS 

is not to be interpreted as probability as though 0 were a random variable., 
Instead, the meaning is as follows: Let n be the range in a sample known to 
be from some rectangular population, and let the value of r 0 be inserted in 
(11), and let 0i be determined so that the value of the integral is 1 - a. The 1 
two limits for the integral are fiducial limits associated with the sample for the 
confidence coefficient 1 - a, which were discussed earlier. Thus, for each 
sample, we can compute fiducial limits using the fiducial distribution. These 
limits, as we have seen by considering the $ function, fluctuate from sample 
to sample in such a way that the probability is 1 -a that they will include 
between them the true value of the range of the population under consideration. 

3, Summary of Principles. From the point of view we have taken the 
essential notions involved in the method of fiducial argument and fiducial 



FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE 


277 


distributions for the case of a continuous variate and one parameter can be 
readily abstracted from the example just discussed. In general, we have the 
following steps: 

(a) A sample is assumed to be randomly drawn from a population with a 
distribution of known functional form/(x, 0), 0 being a parameter. Let 

x n be the values of x in the sample. 

(b) A function, say , • • • , x n , 0) of the sample x’s and 0 is devised so 
that its sampling distribution Gty) involves 0 and the aj's only as they 
enter into The value of 0 in is that for the population from which 
the sample is actually drawn. 

(c) Two numerical values of say and fa are chosen (ordinarily as close 
together as possible) so that the probability computed from G(\p) is 
1 - a (e.g. 0.95) that will lie between and more briefly 
Pty* < f < /«) « 1 - €U 

(d) The inequality \j/' a <\p < \p” which contains only one unknown, namely 0, 
is solved for 0 giving the equivalent inequality 0 < 0 < 8 where 0 and 0 
are fiducial limits and are subject to sampling fluctuations. 

(e) The expression ^ < tf!) ® 1 — a is replaced by the equivalent 

expression P(fi_ <0<0) = 1 — a which states that the probability is 
1 — a that a sample will yield values 0 and 0 which will include the true 
value of 0 between them. 


(f) The differential element for the fiducial distribution of 0 is (?(\t) 


cty 

00 


d9 


(provided d^/90 is a function of 0 which does not change sign for a given 
sample of s's) and is obtained by letting 0 be the variable in G{fj cty, 
keeping the fixed. * 

To give precisely the conditions under which all of these steps can be per¬ 
formed is a technical matter which will not be considered here. It is suffi¬ 
cient to remark that they can be performed in many cases of practical interest. 
Fiducial argument can be carried on using only the first live steps without 
introducing the notion of a fiducial distribution. In connection with step (a) 
it should be particularly noticed that the functional form f(x t 0) of the popu¬ 
lation under question is assumed to be known and that the sample under 
consideration is “randomly drawn” from the population. Thus, in applying 
the theory to practical problems it is a matter of fundamental importance 
that these two assumptions be valid. In cases where a sufficient amount of 
data exists, it can usually be satisfactorily tested by using the x 2 test and other 
devices, whether or not a given functional form for f(x, 0) is a valid assumption. 
In cases where sufficient data do not exist for actually making such a test 
justification for assuming a given function form usually has to be made on the 
basis of theoretical considerations. From a practical point of view the notion 
of randomness is characterized by methods of drawing samples rather thaii 
a posteriori mathematical considerations of the sample after it has been drawn, 
and thus the question of randomly drawing samples depends largely upon the 



278 


S. B, WILKS 


experience and sound judgment of the experimenter. However, after one or 
more samples have been drawn “at random,” the problem of arguing from 
them about the populations from which they were drawn is largely mathe- 
matical 


4. Case of large samples. For a population with a distribution of known 
functional form, a fiducial distribution of the parameter clearly depends on the 
size of the sample and the particular estimate used. For example, in large 
samples, we would get a fiducial distribution of the mean of a normal popula¬ 
tion of known variance by using the sample mean which would be different 
from the one obtained using the median of the sample. In order to be able to 
make the inferences about 6 as accurate as possible, a ^ function should theo¬ 
retically be used which will produce fiducial limits which are closest together, 
on the average, or perhaps “best” in some other sense, for a given a. The 
fiducial distribution obtainable from such a ^ could then be referred to as the 
“best” fiducial distribution, and theoretically it should be used in preference 
to other possible fiducial distributions if fiducial distributions are to be used 
at all to set fiducial limits. In large samples from a population with a distri¬ 
bution function f(x } (f), it is known [6] that, under rather general conditions, 
fiducial limits which are closest together on the average can be obtained by 
letting 


( 12 ) 


* 


Vtt 




and treating ^ as a normally distributed variate with zero mean and unit 

n 

variance, where L ■* log /(&<, 5), the logarithm of the likelihood of 0 for 

i-1 


the given sample, , • * • , x* are values of re in the sample, and E denotes 
mathematical expectation. For example, in the case of a binomial population 
where each individual belongs either to class A or class B, we have /(&, 0) = 
0^(1 - 0) l ~* where 0 is the probability associated with class A, x will be 0 or 1 
according to whether an individual belongs to B or A. In a sample, of n indi¬ 
viduals, L = m log 6 + (n - m) log (1 - 0), where m is the number of individ- 

^rntaA. S {(iiStke)'} = " d ™ S* * - .) ■ 

If we should want to find fiducial limits of 6 for a confidence coefficient of .95, 


we would solve (1) the equations 


m — nd 
VW(1 — B) 


±1.96 for 8, thus getting two 


values of 0,.say 9 and 0. We can then say.that 6 and 0 will include the true 
value of $ between them with a probability of .95 of being correct, in the sense 
that if we applied this rule consistently to samples from binomial populations, 
we would have a procedure that would lead to a correct statement in about 95 
percent of the cases (theoretically). 

To illustrate the difference between the fiducial method and the commonly 



FIDUCIAL DISTRIBUTIONS IN FIDUCIAL INFERENCE 


279 


used method of placing limits on 0 for P = .95, consider an example in 
which m = 150, n •= * 400. The usual procedure is to replace 6 by m/n in 

Q ± 1.96 ^ , which yields .311 and .431. The fiducial procedure is 


, , ,. m — nd 

to solve the equation ^ = drl.96, for 6 ; thus obtaining .312 and .455. 

For the case of small samples, the problem of getting “best” fiducial limits be¬ 
comes more complicated [5], 

5. Extensions of Fiducial Argument. It will be observed that it is not 
necessary for ^ to be a function of only one statistic and 0 in order to be able 
to . argue fiducially about 0. For example, if a sample of n is drawn from a 
normal population with mean 0, it is well known that if x is the sample mean 
then 


(13) 


.r, _ (® 0) Vn(n — 1) 

[E to - xf 


(which is Fisher's i function), has the distribution 


( 14 1 ) _ F(^) __ 

r(i(n - 1)) [1 + W{n-~W 

n 

Here $ is a function of two statistics, namely £ and £ - $)\ and the fiducial 

distribution of 0 for this ^ function is obtained at once by applying rule (/). 

The ideas of fiducial argument may be extended in other directions, but 
these cannot be considered in any detail here. For example ^ may be a func¬ 
tion of Xx , * * • , x n and two or more population parameters, in which case one 
could set up fiducial regions for the several parameters. From a practical 
point of view, the fiducial argument for two or more parameters simultaneously, 
had hardly been touched. Again ^ may be a function of statistics from two 
samples, one observed and the other not yet observed, and not involving popu¬ 
lation parameters, at all, in which case one can argue fiducially about the 
statistic in question for the unobserved sample [3]. The notion of a fiducial 
distribution has been extended to several parameters taken simultaneously 
[3, 7], but the problem of working out relations between fiducial distributions 
of -several parameters and fiducial regions is yet to be investigated. The 
principles may be readily applied in situations in which the x's involved in \j/ 
take on discrete values. In this case the equality signs in the probability ex¬ 
pressions in steps (c) and (d) would be replaced by greater than or equal signs 
(>). Two excellent examples of the application of principles of fiducial argu¬ 
ment to the discrete case are furnished: (i) by a paper by Pearson and Clopper 
[8] on fiducial limits of the probability P from samples from a binomial popula¬ 
tion, and (ii) by a paper by Ricker [9] on fiducial limits of m in the Poisson 
distribution f(x } m) = mV*"/*!, 



S, & WILKS 


REFERENCES 

[1] mm, "Probable Inference, the Law of Succession, and Statistical Inference," 
hr, himn SMcil hmtrn, Vol 22 (1927), pp. 1-212, 

|2] R.A. Pishbb, "The Concepts of Inverse Probability and Fiducial Probability Referring 
to Unknown Parameters,” Pm. Royal Society of London, Series A, Vol, 139 



[7] II Segal, "Fiducial Distribution of Several Parameters with Applicatioa to a Normal 

System,” Pm, Cmkidye PM, foe,, vol, 34 (1938), pp, 41-47, 

[8] C. J, Cloppbr and B, S, Pearson, "The Use of Confidence or Fiducial Limits in the 

Case of the Binomial,” Riomtrik, vol. 24,1934, pp, 494-413, 

|0] William E, Rick®, "Fiducial Limits of the Poisson Frequency Distribution," hr, 
Mm Statistical Amotion, vol, 32,1937, pp, 340-350, 





BIOLOGICAL APPLICATIONS OF NORMAL RANGE AND ASSOCIATED 
SIGNIFICANCE TESTS IN IGNORANCE OF ORIGINAL 
DISTRIBUTION FORMS* 

By William R. Thompson 

The word normal has been used in many senses—commonly by statisticians 
to designate a well-known distribution function. Another use familiar to bi¬ 
ologists, particularly in experimental work and medicine, is to denote an 
untreated or control part of a universe, or a part whose members are free from 
specified characteristics such as evidence of past or present disease or malforma¬ 
tion. Closely related to this last usage are attempts to delimit so-called normal 
ranges of variation for a quantitative attribute of the members of part or all 
of a universe in question. Interpretations are often vague, as when the interval 
between the least and greatest values observed in either a large or a small 
number of instances is taken to estimate a normal range. We shall consider 
the problem of using ranked data for estimating normal ranges as defined in 
the next paragraph. 

If the instances have been drawn at random from a universe ([/) of all 
possible observations obtainable in a prescribed manner, and arc enumerated 
in ascending order of magnitude, {s<| for i - 1 , • • •, n; then it Is proposed to 
show in the present communication how ranges of the type (xt , x„ + i_jt) may 
be used to estimate normal ranges, if/, where the subscript / is the theoretical 
probability that a random value, x, drawn from U will lie within the range R {, 
g that it will lie above, and g that it will lie below (where 2g = 1 - f). Further¬ 
more, it is proposed to show how these ranges may be used as the basis of 
significance tests where altered conditions appear to lead to abnormal biological 
variation. The form of frequency-distribution of U is supposed unknown, 
and is without effect upon the analysis. Section 1 is a development of the 
theory of range estimation, treated briefly in a previous paper [1] together with 
illustrations of its application. Section 2 deals with significance tests. 

1. The Method of Range Estimation. Let x be a real variate, a random 
value draw from an infinite universe or population V. Let J[x) be the fre¬ 
quency function of x in U, supposed unknown; and / f{x) dx = 1. Then 

J-W 

for any given a and 0, where a < (3, and 

(1) P(a<®<|3) ® f/(x)ds. 


v * Presented at a meeting of the American Statistical Association, December 28, 1937, 
Atlantic City, N. J, 


281 



282 


WILLIAM R. THOMPSON 


To facilitate development, suppose that in any finite sampling under considera¬ 
tion no two values of x may be exactly the same. Let 8 * {$*}, fc * 1, . *. , n } 
denote a random sample from V h where the order of enumeration is arbitrary, 
but temporarily taken as a random order (to fix the ideas, consider this the 
order obtained in drawing). Let p* be defined by 

/ Xft 

/(b) dx from which dpi, = /(»*) dxi ,. 

00 

Then p b is the probability that a random x from U shall be less than any number 
ccfe. Then obviously if Xk is drawn at random from U } ph is a random variable 
whose distribution is the unit rectangle; i.e., P(p' < p* < p") =“ p fr - p'. 
Furthermore, the joint probability that x k will lie in the interval a, x h + dxn 
and that exactly r values in the sample S will be less than ** is, to within terms 

of order dp hy - Vh)*^dp k . 

Then, in repeated sampling as above, for the case where just r of the n random 
values {%{} are less than the fc-th drawn, let P n> r{p f < Vk < p n ) denote the 
probability that ph lies in the interval (p' ? p"). Then 

(3) P„(p' <p k < p") = - ’ P'-l'-dp, 


where nn-l-r, and q =» 1 — p. Obviously, the expression on the right 
of (3) does not depend on fc if this index is the order of draft or a random index, 
but only upon the condition that exactly r of the n random values from U be 
less than a value a* drawn at random from the sample of n values. Accord¬ 
ingly, we obtain the same result if we enumerate the n values {s<} in. ascending 
order of magnitude (a* < Xf, if i < j). Then fc ® r + 1, in the cases con¬ 
sidered, and (3) may be written, 

(4) W < „ < ,") - / V* 

for Q £ p' g p" g 1. Obviously, the result is the same if we deal instead with 
the fc-th value (b*) of every random sample S drawn. In passing it may be 
noted that for p' = 0 and p" = p in (4) we have 

(5) P n (Pfe < p) = I p (h, n-k + l), i 


which may be evaluated for fc, n - .jb + 1 § 50 by means of the Tables of the 
Incomplete BetarFundim [2], 

Of course, P„(0 < p* < 1) = 1, and (4) gives pi ,, the mean value of p* in 
repeated random sampling of n values from 17, as 



nl 


(fc-l)l(n 


=*jr 


p'-q^-dp = 


k 

n + 1‘ 



BIOLOGICAL APPLICATIONS OP NORMAL RANGE 


283 


Similarly, the variance, of p k is given by 


(7) 


4k = E[(vk - p k )1 


k(n — h + 1) 

(n + l)Hn + 2)' 


Now suppose that we want to find a range (a, 0) such that, in random drafts 
from 17, the theoretical relative frequency of drawing x less than a is g, and 
the same as that of drawing x greater than 0, (a, 0) may be called a central 
confidence range with a confidence f - 1 — 2g that x drawn at random from U 
will lie within the range. For g = k/(n -f 1) we may take the range R; - 
(xa , Xn-fc+O i and likewise with g = 5% we may estimate, or approximate 
by interpolation where 20fc > n + 1 > 20(fc - 1), a range Rj for normal bio¬ 
logical variation of a specified character, and this may be called briefly the 
estimated 90% central normal range . 


Of course the probability of drawing x < a is J f(x) dx } and that of drawing 
x > 0 is jf f(x) dx\ and these probabilities are unknown, as the frequency 


function f(x) is unknown; but with a » Xk and 0 = Xn-it+i the theoretical 
relative frequency in each case is k/(n + 1) regardless of the universe. 

It has been shown [1] also that if the sample S were drawn at random from 
a finite ordered population of aggregate number N> denoted by Uy , and Npk 
is the number of values in Uy that are less than the fe-th member of the given 
random sample in ascending order of magnitude; then, for S a sample of n 
values as before, the mean value of pk in repeated sampling is 


5 ‘-dh( 1+ 5) _ 5' * nd 

2 k(n-k + 1) / . AA _ n\ 

(n+lMn + 2) V N/V n)' 

An example is furnished by an analysis of data reported by Wadsworth and 
Hyman [3] in.a study of influences of antigenic treatment of horses upon their 
plasma concentration of esterified cholesterol, free cholesterol, and phospho¬ 
lipids, As in chart 1 for normal horses, a graph has been constructed for each 
horse studied, using time as abscissa and a logarithmic ordinate scale for ob¬ 
served values of plasma concentration of the constituents: 

1. Esterified Cholesterol, 

2. Free Cholesterol, and 

3. Phospholipids time one-tenth , 

the respective successive points for each being joined to form three polygon 
curves. As these are in all cases discrete and lie in the order of enumeration 
from top to bottom of the graph, no special label seemed needed; but estimated 
normal ranges for the central 90% of variation have been indicated in each 



284 


WILLIAM R. THOMPSON 


case by two horizontal lines between brackets at the right, numbered to corre- 
spond with the enumeration above. The ranges are based on observations on 
62 plasma samples, each from a different presumably normal horse. The normal 
horses in the chart show about the same individual variations; but, of course, 
the ranges are not to be interpreted to indicate normal variation for an indi¬ 
vidual animal. . 

Chart 2 presents in like manner the data obtained for horses under immuniza¬ 
tion against tetanus and the streptococcus. The tetanus immunization treat- 


< 

DAYS 

) 10 20 

30 

80 

1-1— 


40 

, —- 

20 

t— T ■ ■ 

-J' 


10 

^ 540 


DAYS 

0 10 20 30 



546 


DAYS 




aot 


401- 

20f 
10 


54f 




NORMAL HORSES 

Chabt 1 . On each graph for a given normal horse, the number of which appears below, 
the curves in descending order respectively represent (I) esterified cholesterol, (2) free 
cholesterol, and (3) one-tenth phospholipid concentration in plasma (in mg. per 100 ec,). 
Corresponoing 90-per-cent normal range estimates are indicated. 

ment appears to produce marked and sustained depression in all three curves 
of at least five of the six animals observed. 

That this is statistically significant seems obvious. A single observation 
below the 90% normal range should be expected once in twenty random trials 
if normal causes of variation may be assumed unaffected by the treatment in 
question. The expectation of obtaining 5 or more such values in six independent 
trials is obviously muoh less, and may be accurately estimated by means of rela¬ 
tions developed in the following section. 



BIOLOGICAL APPLICATIONS OF NORMAL .RANGE 


285 


• Significance Tests. Now consider as in section 1 another sample S f of n r 
values ; {$*# }, k* = 1 , • *. i n t (where x[ < x'j if i < j) f drawn at random from an 
infinite universe U l as was S from V ; but where IV and V are not necessarily 



Chart 2. On each graph for horses receiving the indicated antigenic treatment and one 
untreated horse, the curves in descending order respectively represent (1) eaterified cho¬ 
lesterol, (2) free cholesterol, and (3) one-tenth phospholipid concentration in plasma (in 
mg. per 100 ce.). Corresponding 90-per-cent normal range estimates are indicated. 


the same universe. In like manner it may be shown that, if xf is drawn at ran¬ 
dom from V ' and p[.t denotes P(x f < x' k >), then 

(9 t. 

where q = 1 - p, v = k' — 1, w = n 1 — k 1 , and 0 g <j>‘ £ <j>" i 1. 

The probabilities in (4) and (8) are independent, obviously, whether V is the 
same as U or not. Accordingly, these relations make possible an evaluation 






286 


WILLIAM B. THOMPSON 


of P(pi, < pi) under the circumstances where repeated sampling is applied to 
both the case of S and to that of S'. With this understanding, then 


(9) 


/» (r + a + l)!(t/+10 + 1)! f 1 r a , f 1 t w , 
P( Pk<Pk .) -rTsFrhui!-j, *'*'**•']„* ^ 


where, as before, r * k — 1, s — n — fc, v *= V — 1, w — nf — Jfc', g s l — p } 
and go ^ 1 — 25o - 

In a previous paper [4] a ^-function was defined as 

+ Z — a\/s + ./ + l+a\ 

(10) *(r, a, r', s') - -A A l y ' - l 

( r + s + r + s' + 2\ 
l f + 3 + 1 / 


for any four rational integers r, s, r', s' £ 0; and it was shown in detail that the 
right member of (9) is equal to 5, v } w ); whence we may write 

(11) P(pk < pk f ) = ~ 1, n - k, V - 1, n[ ~ h f ). 

Obviously, if V and 37' are the same universe, then p* < if and only if 
Xk < av , and then we have 

(12) P{x k < av) * ¥(fc - 1, n - ft, ft' tf) 


in repeated random sampling applied to both sample types, S and respectively 
of n and of n' observations, In the paper just mentioned, and in another [5] 
the ^-function was further developed by extension of definition to include 
\p(r, s, — 1, $') = 0, and it was shown that 


(18) ^(r, s, r', s') s= ^r(r, r', s, s') * r', 8 ( r)slf #(s, r, s', r'). 

Further demonstrations [5] included the relation, 



which offers another form for calculation, The identities in (13) are particularly 
useful to facilitate calculation where one of the four arguments is small. A 
system for forming a table has also been developed [4, 5] in an economical form, 
but tabulation has been given only for the arguments not exceeding 5. 

Now, in applying a test based on relation (12) or on that for the complemen¬ 
tary probability, P{xl> < x k ) which obviously, by (13), equals ¥(n - ft, ft - 1, 
n f - ft', ft' - 1), we may wish to exclude from the normal set of observations 
those values obtained from animals later given the treatment in question in the 
statistical significance test, The purpose would be to avoid violation of the 
condition of independent sampling required. In the case of the tetanus antigen 
treatment, we have an experience wherein 5 or more of 6 horses treated yield 



BIOLOGICAL APPLICATIONS OS’ NOHMAL RANGE 287 

values for a given plasma constituent less than the third in ascending order of 
magnitude (namely x 3 ) in our independent set of normal values, Here n’ — 6, 
and ft = 62 — 6 = 56. In accordance with the hypothesis that the treatment 
in question does not affect normal causes of variation in the plasma constituent 
under investigation we have P(sJ» < x t ) is $(53, 2, 0 - k 1 , k' - 1). This is 
approximately 1.891(10) -6 for k' = 5, and 4,555(10)" 7 for k' = 6. Obviously, 
a rule for establishing the value of k to be used in such tests should be fixed in 
advance without prejudice, as in the present case where we have taken 
k £ g(ft + 1) > k - 1 for g = 5%. 

In the case of streptococcus immunization treatment, the corresponding test 
would have n = 58, n' - 4, k = 3, and k 1 = 4, 3, or 2; which would yield ap¬ 
proximately 2.689(10)” { , 1,031(10)"*, or 1.817(10)~ s , respectively for P(s£. < x 3 ), 
Thus it appears that where such values are found (intuitively it would 
appear a fortiori if we compare instead with x s of the entire normal set of 62 
values), their low magnitude appears to discredit the hypothesis that such dis¬ 
crepancies are ascribable to mere chance normal variation in the quantitative 
attribute investigated. 

The tests proposed are free from any assumption concerning the form of the 
original distribution f(x). The illustrative material is only a part of that pre¬ 
sented with similar statistical treatment in the paper of Wadsworth and Hyman 
[3], which makes it apparent that the tests suggested here may be useful and 
powerful in analysis of biological and other experimental data. From a similar 
point of view, Hotelling and Pabst [6] developed tests of bi-variate correlation, 
and Milton Friedman has elaborated a multi-variate rank analysis (7], the tests 
being likewise free from any assumption about the form of the original distribu¬ 
tions. In a previous paper [1] confidence ranges for the median are based 
similarly, employing relation (5) for the special case p = f 

Division of Laboratories and Research 
New York State Department of Health 
Albany, N. Y. 


REFERENCES 

[1] W. R. Thompson, AnnaU of Mathematical Statistics, Vol. 7 (1936), p. 122, 

[2] Tables of the Incomplete Beta-function, edited by Karl Pearson, (Office of Biometrika, 

University College, London), 1934, p. 494. 

[3] Augustus Wadsworth and L. W. Hyman, Jour. Immunol, Vol. 35 (1938), p. 85. 

[4] W. R. Thompson, Biometrika, Vol. 25 (1933), p. 285, 

[5] W. R. Thompson, American Journal of Mathematics, Vol. 67 (1935), p, 450. 

[6] H, Hotelling and M. R. Pabst, Annals of Mathematical Statistics, Vol, 7 (1936), p, 29. 
[71 Milton Friedman, Jour. Amer. Slat. Assoc., Vol. 32 (1937), p. 675. 



THE COMPUTATION OF MOMENTS WITH THE USE 
OF CUMULATIVE TOTALS 

By Paul S. Dwyer 

1. Introduction. Various authors have shown how the moments of a fre¬ 
quency distribution may be computed from cumulated frequencies. 1 In order 
to make clear to the reader the type of technique under discussion there is 
presented an illustration which is, essentially, that used by Hardy, [2, p. 59]. 
The value 2/, = 729 is the last entry in column 4. 

We use Ci to denote the entry in column 4 which is opposite the smallest 
variate (or class mark if the distribution is grouped), Similarly Cs is the entry 
above C \, and C* the entry to the right of C \, etc. In this notation the diagonal 
entries, the ones underscored in Table I, are C\ , Cl , C\, C\ , C‘ . 

The moments® about the smallest variate can be expressed in terms of the 
cumulations of Table I in different ways. One method utilizes the diagonal 
entries and the differences of zero, Thus 

£ s/» * Cl - 2916; £ m $ + 2Gl = 12333; 

0 0 

tx'f, m <f t + 6Ca + 60S - 57996; 

0 

6 

£ x% rn $ + UC 3 + 36C'l 4 - 2iC\ = 278316, etc. 

0 

A second method utilizes the entries in the next to the last row and the differ¬ 
ences of zero. Thus 

t xf' x = Cj = 2916; £ = -Cl + 2C| = 12636; 

0 ,0 

£ z% » Cs - 6Cs -f 6C S = 57996; 

t 

£ x% * - Cl + 14Cj - 36Cs + 24Cs = 278316, etc. 

0 

1 The reader is referred to reference [1] t ,. [15], at end of paper. 

1 It ia to be noted that we are not talking about momenta per unit frequency. We are 
uaing the term in the sense used for example by Whittaker and Robinson. See [20, p. 18], 

288 



COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 


289 


A third method, which seems to have escaped previous attention, involves 
columnar entries and multipliers whose determination and properties are a chief 
concern of this paper. Thus 

» 6 

£ = Cl = 2916; £ x% = Cl + C, 3 = 12636: 

Q 0 

6 

£ x% = C\ + id + Cj = 67996; 

0 

s 

£ x% = cl+ lie, 6 + lies + cl = 278316, etc. 

0 

It is possible also to obtain formulas when the cumulations are made from the 
smallest variate-to the largest variate and, indeed, the whole theory of the 
present paper could be duplicated with such a theory of cumulation. 


TABLE I 

Successive Frequency Cumulations 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

X 

X 

F* 

C l 

C ! 

c* 


C‘ 

a + 6 



64 

64 

64 

64 

64 

a -j~ 5 


192 

256 


384 

448 

512 

a + 4 



496 

816 

1200 

1648 

2160 

a -f 3 



656 

1472 

2672 

4320 

6480 

a + 2 



716 

2188 

4860 

9180 

15660 

a + 1 

1 

12 

728 

2916 

7776 

16956 

32616 

a 

0 

1 

729 

3645 

11421 

28377 

60993 


It is possible to obtain the columnar formulas from the well known diagonal 
formulas. Prom the construction of Table I it is clear that 

a) c\ =d + 1 +cf‘ 

so that 

C\ = Ct]Cl + 2Cl = Cl + Ch C2 + 6$+6Cl = $ + 4$ + C}; 

(o') 

Cl + UCl + 36C} + 24C! = Cl + lid + lie! + 0\: 

Formula (1) can be used similarly in deriving columnar formulas from row 
formulas, diagonal formulas from row formulas, etc. 

The columnar method is here recommended as a useful substitute for the 
usual elementary method of computing moments. The many multiplications 
involved in the usual process are replaced by continued addition. The chief 























290 


PAUL S. DWYER 


disadvantage of the method is the continual recording, although this obstacle 
is surmounted with an adding machine equipped with a recording tape. The 
resulting moments are easily checked with an adaptation of Charlier’s check, 
as is shown in section 8, and methods are given by which the multipliers are 
easily obtained. The method is also well adapted to the use of Hollerith 
machines, 

The introduction of suct^ columnar multipliers tends to give a different empha¬ 
sis to the cumulative totals technique. The use of diagonal entries led logically 
to an emphasis upon factorial moments, while the columnar method tends to 
emphasize the more familiar power moments, The primary application here 
indicated is not to elaborate and specialized techniques, but rather to the simple, 
though often tedious, problem of the computation of power moments. 

The aims of this paper are then: 

(1) To show how moments may be computed from the columnar valuea of the 
successive cumulations, 

(2) To discover the properties of the columnar multipliers, 

(3) To present a general theory for computation of moments using cumulative 
totals. 


2. The Basic Cumulative Theorem. The use of (1) is not satisfactory in 
getting precise formulas for the columnar multipliers so we derive the columnar 
cumulative theory directly from first principles. We first prove 
Theorem I. Let x be any real number and let u* be a real function of x which is 
0 when x < a and when x > a + k and which is not infinite for x » a } a + 1, 
a + 2, «■ • , o 4- k. Let v z be a real function of x and tf#, called range v 9 , a func¬ 
tion such that y x « v t when a, a + 1, • • •, a + fc and v x = 0 at aU 'points 


outside (he range a to a -f k If 2^ is indicated by Cu x and v x — v x ~i by V v Z} 

t 

Vs - &-i by V i?* then 

fl-Hc a-ffo a+jfe 

(3) ^ UgV x 5= ^ UxVso = ^ Cllx^Vz * 


The values u t ,v 9i Cu t , V& are presented in Table II. 
The theorem is proved by forming 


s %+k1>a+k + • ■ • + UaiiV^i + * • - +UaV a 


a+k 

^ j UzV% 
a 

Theorem I can also be written as 


K K 



COMPUTATION OP MOMENTS WITH CUMULATIVE TOTALS 291 

3. The Successive Cumulation Theorem. 

Theorem II, If C 2 u x = C[Cu x ] and V = V(V».), etc., then 

fl+fc o+fc 

E tfelfc = E U x y x a E C H ’ 1 U a! V ,+1 f/ s . 

o o o 

This theorem follows readily from Theorem I. If 

U t = Cm* and y* = Vo*, then 

a+J> o+fc o+fc a+k 

L kvx = £ «.& = £ c*y* = £ cc,vv* = £ c*«*v‘p*. 

fl a a a 

This process can be extended as many times as desired so that 

a4 -h o4-Jfp o+k 

(5) £ UxVx = £ u*o* = £ C' +i m*V ,+1 ?>,. 

a a a 

TABLE II 


FaZues 0 / x, Cu x , and . 


X 

u x 

»* 

Cu x 

• Vi;» 

a + h 

Ua+b 

Va+Jfe 

'Ua-^k 

tfo+k "" Va+*-l 

a + fe — 1 




Do+l-1 ““ Va+1^2 






! 

■ 

m 



a + i 

Wo+i 

Va+l 

Ua+k + • • • + Mof i 

flo+< — Pa+l-l 


■ 









d + 1 

Wo+l 

Va +1 

U a +k + * ‘ ‘ + Wo-M + * * * + 1 

t/o+l “ flo 

a 

Mo 

Wo 

U«+* + • * * + M a+t - + • * • + W B+ i + U a 

' t>a 


This can also be written as 

* * 

(6) Tj 'Ua+#^o+T ~ ^ 'Ua+iHa+r ~ iZl ^ ?«+*• 

0 0 0 

In order to determine the values V 8+1 tk+s, 0 < x < fc, we note that 

( 7 ) r V \+x = £ (- 1 )‘ + t X ) Va + x-t, 
so that 

(8) v* +1 t/ a+ * = £ (-1)' (* + t »«•*-<. 


























292 


PAUL S. DWYER 


We also know that, x <k 

&+!-( = Da+i-t when i < x 

(ft) 

ia+x-t ~ 9 when t > % 

so that 

(10) V'+V, - E (-D* (* | x ) 0 ^ « 

(11) V ,+ V* = E <-l)‘ (®*) ».+*-< = V* +1 « a+r , s<x £k.' 
The fomula (6) can then be written 

aU k « A 

(12) £ «,* = £ iwn.Urt. = £ C a+1 u, +t v ,H ia+x + £ C* +l u aH *v ,+1 iw*. 

x 0 C 1+1 

4. Moments from the Cumulated Frequencies. If = f a+x and 
v a+x - (a + %y t then (6) gives 

(13) £ (a + *)*/«+» « E c >+ 7. + xV* +1 («±£)*. 

0 0 

A more useful formula, obtained from (12), is 

(14) £ (o + *)•/.*, = £ + ») *. . 

0 !) 

since V* +I (a + $)* = 0. We have then 
Theorem: III. The values of the s-th moments cm be obtained from the last 
8 + 1 entries of the (s + l)s£ cumulation of the frequencies. The multipliers are 
the values 

(15) v rt (»+£)' = £ (-1)' + *) (o + a:- t)\ 

Cor. 1. When a - 0, i,e., when the moments are measured about the smallest 
variate, the multipliers are 

do v+y- E(-D‘( s t 1 ) (*-<)*• 

Cor. 2, When a = 1, the multipliers are 

(17) V ,+1 (l + «) • * £ (-1)' ( s ^ x ) (1 + * - ty. 

Cor, 3. If the moments are measured about a fixed value, p, then the new 
smallest variate is o — p = a' and the multipliers are V ,+1 (o' + x)‘. 



COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 


293 


Cor. 4. If p is the mean, m, then a' = a - m. If in addition a = 0, then 
o' = — m and the multipliers giving moments about the mean are V ,+1 (x - m)\ 
Now ™ 


m 



cfv^o + C 3 Vl C\ 
cl " “ CY 


It follows that the multipliers giving the moments about the mean are 


It is to be noted that the moments about different points are obtained by 
applying different multipliers to the same cumulated frequencies. 


5. Values of the multipliers. The values of the multipliers may be computed 
from (15). Thus V 3 ( a + l) 2 = (a + l) 2 — 3a 2 = -2a 2 + 2a + 1. This be¬ 
comes 2ab + 1 when 1 — a is set equal to b. Values of the multipliers for the 
most common values of s and a are presented in Table III. 


TABLE III 
Values of v ,+1 (a -f x)‘ 


8 

0 

i 

1 

i 

2 

3 

4 

4 

1 

1 



b* 

3 


i 


b* 

m + 6b 1 + 4b + 1 

2 


i 

b* 

3b 2 o 4" 35 1 

60^ + 12ab + 11 

1 


b 

2ab + 1 

3a J 6 + 3a + 1 

4a 3 b + 6a ! + 4a + 1 

0 

i 

a 

a 2 

a 8 

a 3 


Whena = 0,6 = 1 and the multipliers are 1; 0,1,1; 0,1, 4,1; 0,1,11,11,1; 
etc. as indicated in section 1. When « = 1, b = 0 and the multipliers are 1; 
1, 0; 1, 4, 1, 0; 1,11,11,1, 0; etc. When the moments are measured about a 
fixed point, p, it is only necessary to compute a! = a — p and to use a' for a 
and b' = 1 — a' for b in Table III. 

We illustrate the use of the multipliers by application to the problem of 
Table I. The moments about the smallest variate are computed in section 1. 

The moments, when a = 1 are 2 + 1)/* = C\ = 3654; 2 (* + 1 )*/* = 

* A ft 



























294 


PAUL S. DWYER 


Cl + cl = 19197; L(a + l) 8 /, == C\ + 4$ + Cl - 105381; EO® + l) 4 /, = 

0 0 

C\ + net + net + (ft = 598509. 

Cl 2916 

The moments about the mean are found by forming = -^g" ~ 4. Then a 

= -4 and the multipliers are 1; -4, 5; 16, -39, 25; -64, 229, —284, 126; 

« 8 6 

256, -1199,2171, -1829, 625;etc. so that E xf x = 0; E - 972; E* 3 /« = 

0 0 0 

-324; E x% = 3564. „ ' 

0 

Since the values of V' +1 (» - Cl/C\)’ are expressible in terms of C\ and Cl, it 

follows that the values of E are expressible in terms of cumulations. For 
0 

example a formula for the second moment about the mean, which is essentially 
one given by Whittaker and Robinson [7, p. 193] is 

ajjs /n2\2 

(19) + 

a Oi 

However the general method described above, supplemented with the tech¬ 
niques of succeeding sections, ib preferred to the development and use of such 
formulas. 

6. Recursion Property of the Multipliers. It is not readily apparent from 
Table III how the multipliers of the (« + l)-th cumulations can be obtained 
from the multipliers of the 8-th cumulations. It is possible to establish a re¬ 
cursion formula which is useful for this purpose. Now, a < x < s, 

V‘ +1 (q + x) ‘ = (o + x)' + E (-1)' ^ ^ (a + * - t)‘ 

(« + g)V*(a + *T l - (a + *)' + E (-D 1 (®) (fl-M- tr\a + x) 

(s + 1 - a - g)V (fl + x - i y~* 

= (~ 1)‘ _1 ^ (a + * - t )*" l (8 +1 - a - *) 

and since 

(^j (a + *) - ^ ^ ( s + 1 - a - *) = I (a + x - t) 
it follows that 

(20) A ,+1 (a + %y m (a + x)V(a + x)^ + ( 4 +1 _ a - x)V (a + x- 1 )‘~\ 



COMPUTATION OB' MOMENTS WITH CUMULATIVE TOTALS 


295 


When a = 0 we have 

(21) WV - affVf 1 + (s + 1 - x)V'( x - l) *" 1 . 

Formulas (20) and (21), though somewhat formidable in appearance, are easy 
to apply. Thus V 8 ( a + 2) 2 = (a + 2)7 8 ( o + 2) + (1 - g)v‘ (a + 1) . The 
recursion formula is especially useful in building up tables of multipliers. The 
following form is recommended: 

As successive columnar headings use the values a, a. + 1, a + 2, etc. and as 
successive row headings use 1 - a, 2 — a, 3 - a, etc. Then Vtf 1 = 1 is placed 
in the upper left cell, V 2 a directly below Va°, V*a + 1 to the right of vV, etc. 
The values of V*( a + x) 8 are placed in the next diagonal, etc. If this process is 
continued the entry V ,+1 (a + x) * will have the entry V*( q + x) ^ 1 directly above 
it and the entry V*( a + x — I) *’* 1 on its left. Also the columnar heading is 
a + x and the row heading a + l- a — xso that any entry is obtained by 
adding the product of the entry above it and the columnar heading to the 
product of the entry to the left and the row heading. The values of V >+ V 
ate obtained by placing a = 0. They are presented, in Table IV, through 8=8, 


TABLE IV 
Values of V'+V 


\ 

\ X 

8+1 “$\ 

1 

1 

2 

3 

4 

5 

| 

1 

3 

1 

1 

1 

1 

i 

1 

i 



i 

2 

0 

1 

4 


26 

i 

57 

120 

247 


3 1 

m 

1 

11 

66 


1191 

4293 

j 



4 

i 

1 

1 _ 

26 

302 i 

2416 

15619 


- 


5 

6 



1191 

15619 





6 

i 

0 


63 

4293 






7 

0 

1 

247 







8 

0 

1 









The table is easily extended to higher values of s. If a table of values of,V ,+l 
(x -f 1/ is constructed, it will be found to be like Table IV with columns and 
rows interchanged. Hence the values of V n (x + 1/ are obtained from Table 







































PAUL S. DWYER 


IV by reading the multipliers down the diagonal. Thus the values V s (x + l) 2 
are 1, 4,1,0, etc. 

The ease with which the multipliers may be computed is illustrated with 
a == — 4. In this case we have 


TABLE V 

Values of V ,+1 (x + a)' with a = -4 



These values agree with those computed more laboriously in section 5. 

h 

7, Value of £ V ,+1 (x + a)\ It is to be noted in Tables III, IV, V that the 

o -:— 

sum of the entries in the diagonal having s + 1 terms is $\ This is generally 
true and results from the fact that 

ft 9 

(22) £ V‘% + a)' = £ V* w (x + a)* = si 

o -— o - 

ft ft 

In obtaining the values of 22 V a+2 (x + a) a+1 from the value of 4- a)* 

o - a —■—■— 

it is noted that V^\ x + a) * is used but twice. Once it is multiplied by a + x 
and once by s 4-1 — a - x so that the net result is a multiplication by s 4- 1. 

ft ft ft 

It follows that 22 V #+3 (x 4* a) ,+1 (s +1) £ V ,+1 (x 4- &)' and since 22 V 2 (x + 1) 

0 --— o - o --— 

.ft • .ft , 

= 1, £ V 3 (a: + a) 2 = 21 bo that in general £ V* +1 (« + a)' - s! 
o - o --— 

This property is useful in checking the values of the computed multipliers. 

8. The adaptation of ,the Charlier check. An adaptation of the Charlier 
check serves as an excellent check for the computed moments. It is recalled 
that the Charlier cheok gives 

«+ft 8 / \ ffl+ft 

£ (a +1)7* = £ ( ) £ 

a <*»0 \v/ %—a 


( 23 ) 


























COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 


297 


The components of the right hand member are computed by cumulative totals 
as indicated above. The left hand member is obtained by applying different 

a+k k 

multipliers to the same cumulated frequencies. Thus £(*+ 1)% = Z) (* + « 

a 0 

+ 1 )*fx+a and the multipliers of the cumulated frequencies are V* +1 (:c + a') a 
where a' = a + 1. If a ~ 0 the Charlier check multipliers are the values 

V a+1 (ic + I)' which can be read from Table IV. For example 52 (s + l) 4 /* — 

6 ° 

C? + 11CS + llCt + C\ *= 598509 and this checks with 52 a 4 /, + 4 52 %% + 

0 6 0 0 

6 E z'U + 4 E x} x + E/. ■ 

0 0 0 

9. Application to factorial moments. When w* = /,, »* = x w = x(x - 1) 
(* - 2) • • • (x - 8 + 1) 

E = E c rt-1 /.v rH s w 

0 0 

and since V' +l ® < ‘ > is 0 when s < x < k, is si when s = x, is 0 when 0 < x < s, 

(24) t x w f x = f x U) f x = s!CI+i. 

0 « 

It follows that the underscored terms of Table I, when multiplied by $!, give 
the factorial moments. Factorial moments, first used by Sheppard [4], have 
since come into prominence largely because of this ease of computation, 

x(x 1} x^ 

The coefficients of (a + 5)* are 1, x 9 —- > “ * * » • If we define 

k (f) ‘ i ft 

the binomial moment by B s - 52 —r U [6, p. 278] then B a ® - X) £ (,) /* ■ Cl+i • 

o 81 Si o 

It is also possible to show that the entries under the main diagonal are bi¬ 
nomial moments. In Table I, for example, we let a = 1 and add the additional 
row a = 0 with 0 frequency. Then C\ = 729, Cl = 729, Cl = 729 + 3645 = 
4374, etc. The new diagonal terms are directly under the old diagonal terms and 

7 8 

give £, ( i = 52 s (<0 /* = 52 (« + l) (#) /» * la general the terms B til are given l 
10 

rows below the terms B t and the factorial moments are s! B ty i . Then 

(25) F.,i = s!C3U 

9 

For example in the problem of Table I, Fk,% - 52 £ (4) /* = 4!Ca = 782,784, The 

3 

method is especially adapted to the use of Hollerith machines, for positive 
integral values of l } since it is only necessary to have the machine continue its 
cumulation. 



298 


PAUL, S, DWYER 


10, The cumulations of xf ». It is possible to use the cumulations of xf x in 
securing the values of the moments. Now 

a+J k k k 

E = E (® + a)‘ + 1 f x +a = E (® + «)/*+*(* + «)* 

y . fl 0 o 

(26) 

= E C ,+l (* + a)f^\x + a)*. 

0 --- 

When o = 0, (26) becomes 

(27) £ x* l f, m £ c^y.v+V. 

0 D 

We compute the cumulations of xf for the problem of Table I. These are given 
in Table VI. 


TABLE VI 


Cumulations of #/* 


* 

u 


! 

c* 

C 3 


6 

64 

384 

384 

384 

384 

384 

5 

192 

960 

1344 

1728 

2112 

2496 

4 

240 

960 

2304 

4032 

6144 

8640 

3 

160 

480 

2784 

6816 

12960 

21600 

2 

60 

120 

2904 

9720 

22680 

44280 

1 

12 

12 

2916 

12636 

36316 

79596 

o 

1 

1 

2916 

15552 

50868 

130464 


so that 

0 S 0 

E »/* - 2916; E »*/* = 12636; E iU = 35316 + 22680 = 57996; 

D 0 0 

6 

E **/• = 79596 + 4(44280) + 21600 = 278316. 

0 

In getting moments about the mean from the cumulations of xf x , the follow¬ 
ing method is recommended. 

(28) 23 ***/■ — 23 $*(« - m ) f x = 2 23 

0 0 0 0 

and 

(29) £ *v. - £ C H \xf,)V + Xx - m y. 

0 0 

k k k 

When s = 1, (28) gives E &U =* E *j/i. - toE and 

0 0 0 



COMPUTATION OF MOMENTS WITH CUMULATIVE TOTALS 


(30) 2 = £ ia/ z , 

0 0 

In the illustrative problem a = -4 so that 
6 

£ 5®/, = -4(15552) + 5(12636) = 972 
0 

fl 

£ « 2 ®/* = 16(50868) - 39(35316) + 25(22680) = 3564 
6 

£ £%U = 2268 
0 

and 

i 

£ if. = 972; £ x l U = 3564 - 4(972) = -324; £ x% = 3564. 

k 

Formula (30) is of note since it permits the determination of £ £% directly from 

0 

the cumulations of xf s . 

The factorial moments are also related to the cumulations of xf x . Thus 

(31) £ ® w /x = £ (* -1 )**& = £ W,)v*(® -1)'*-” 

0 0 o - 

fc 

which results in £ x w U = (s - 1) !C!(®/»). 

D 

It follows that 

C\(xf a ) = sC£(/,). 

For example, the underscored terms of Table VI are respectively 1,2, 3, 4 times 
underscored terms of Table I. 

In general the cumulations of xf x , rather than of /*, are recommended since 
C(xf x ) can be computed and recorded almost as quickly as (?(/*), since one less 
cumulation is needed to obtain a specific moment, and since the multipliers 
needed to get a specific moment are smaller* A technique based on the cumu¬ 
lations of xfx is especially adapted to the use of Hollerith machines. Let us take 
x x to represent the sum of the x*s for all items in the distribution having the 
same value of x. Then xf K = x 9 and we have 

a+Jfe a+fc a+A 

(32) £ *7. = £ x'- 1 ®, = £ 


If the cards are sorted for x and the tabulator is wired to print cumulative totals 
each time x ohanges, the recording tape gives the successive values of C(£r). 
(Care must be taken that there are no blank values of x.) 

If a summary punch is available, these cumulations are punched on cards as 



300 


PAXIL S. DWYER 


they are cumulated and these summary cards are used in getting higher cumu¬ 
lations. 

If no summary punch is available, it is possible to obtain £ by the applica¬ 

tion of Theorem I. Thus 


ei+fc a+fc fl+Ai 

Y a 2 /* = S % = Z) C(x t )v(x), 


a+A 


and since V(x) = a when a: = a and V(x) = 1 when a: > a, it follows that X) 

0 

can be obtained by adding the entries above the last and then adding the last 
entry multiplied by a. This is essentially the Mendenhall-Warren-Hollerith 
method of getting £ [9; P-27]. 

In case a « 0 the technique amounts simply to adding all the entries above 
the bottom one. 

The value £ x% can be obtained similarly from the first order cumulations. 
Thus 


(33) 


a+A a+A a+A 

Y = Y x\ = Y c(*»)v(x J ) 


and since V(x) = a 2 when x = a, V(x ! ) = 2x - 1 when x > a, it follows that 


(34) 


o+Jb 


a+fc 


Y *7. - + Y CM( 2x - 1). 


a+1 


When a = 0, (34) becomes 


(35) 


Y = Y C(x r )(2x - 1) 
0 1 


so that the multipliers are the successive odd integers. Thus from the first 
order cumulations of Table VI we have 

Y xf x = 2916; Y *V, = 12636; Y *7. = 57996. 

0 0 O' 

The cumulative method can also be applied to the method of digiting [17, 
p.425]. 

It is also possible to obtain the moments from the cumulations of x %, xji , 
etc., since 

4+fc n +k n+fe 

. Yx‘% = Y Mf, = Y c' +i {x%)v ,+l {x‘) 

o a o "— 

fl+fc a+k a+k 

Y - Y = Y C ,+ V/0v 1+ V) 

,a a a 

but the cumulations of xf v are preferable for most purposes. The Charlier check 
works in all cases. It shouLd be noted that the indicated Hollerith technique 



COMPUTATION OP MOMENTS WITH CUMULATIVE TOTALS 


301 


demands only the customary tabulator and not the expensive, time consuming, 
card punching, multiplier, [16]. 

11.. Product Moments, Correlation. It is possible to apply the cumulative 
technique in getting product moments involving two variables. If we let y x 
be the sum of all the values of y having the same value of x, then 

a+fc a+a 

(36) £ x'vU = £ y,af = £ rwv'+'M 

a a — 

* 

so that the multipliers are the same as those previously used, When Hol¬ 
lerith machines are used, it is only necessary to sort the cards for x and to wire 
the machine to give cumulations on variables x, y t g, etc. If the machine is 
. adjusted to take totals with each change in x, the tape records simultaneously 
the values of C(x x ), C(y x ), Q(z x ), etc. With a summary punch it is possible to 
form successive cumulations easily. The values 2x 8+1 , 2 x*y, 2x z, etc. are then 
obtained by applying the multipliers, When s » 1, (36) becomes 

a+a 

(37) £ xyfr, = £ C\y t )v\x) 

a 

so that the multipliers are a, 1 — a, 0, 0, etc. When a = 0, the multipliers are 
0,1, 0, 0, etc. and when a = 1, they are 1, 0,0, etc. 

When no summary punch is available, it is necessary to obtain the values of 
the moments from the first order cumulations. Using Theorem I 

o+A a+a 

(38) £ xyjty = £ C%)V(?) = aC\(y z ) + £ (%,). 

a 'o+l 

This formula serves as the basis of the Mendenhall-Warren-Hollerith Correlation 
Method, [9, P« 27]. 

It can be shown in similar fashion that 

a+a 

(39) £ = JCl + £ C(y,)(2x - 1) 

a+1 

and when a = 0 

(40) £ - £ C , W(2» - !)• 

The method is also adapted to the common problem of finding correlation 
coefficients from grouped data when Hollerith machines are not available and 
this method is recommended for the determination of these coefficients. 

An illustration is presented in Table VII which shows the correlation existing 
between college first semester average, X } and preparatory school average, Y, for 
1126 students entering the College of Literature, Science and the Aits of the 
University of Michigan in 1928. The coded values of X and Y are indicated by 
x and y and are positive integers beginning with 0. The coded values are given 



302 


PAUL S. DWYER 


in descending order beginning with the upper left hand corner of the chart. 
The values of the cumulations are placed at the right hand side and at the bot¬ 
tom of the chart. 


TABLE VII 

Correlation with cumulative totals 


I to 

(2) 

(3) 

(4) | (5) | (6) | (7) | (8) | 

(9) 

(10) 

(11) | (12) 

(13) 

(14) 

X 



B 

111 

SB 

n 

Jgj 

2.49 

2.00- 

H 

£m 

SI 

m 

1 







8 

■ 7 

6 

5 

4 

3 

2 

1 

| 



■ 

■ 

S9 

13 

H 

107 

220 

341 

179 

121 



Cx v 

Cy* 

4.00 

6 

18 

5 

2 

5 

5 

■ 

■ 

■ 

B 

1 

113 

■ 

3.99 

3.50- 

5 

106 

2 

19 

29 

27 

I 

7 

■ 

1 

1 

673 

638 

3.49 

3,00- 

4 

178 

3 

12 

35 

53 

44 

18 

1 

5 

2 

1503 

1350 

2.99 

2.50- 

8 

270 

3 



56 

103 

33 

27 

ii 

8 

2568 

2160 

2.49 

2.00- 

2 

330 

■ 


11 

54 

1 

07 

46 

19 

13 

3714 


LOO 

1.50- 

1 

173 

I 

1 

5 

19 

45 

IB 


18 

7 

4244 

2993 

1,49 

1.00- 

0 

51 

1 

1 

2 

1 

■ 

10 

8 

6 

4 

4399 

2993 



Cy, 

61 

259 

661 


2194 

2678 

Bjj^ 

2923 

2993 

12815 

■ 

Ki 



Cka 

104 


IK 

ttim 


3660 


4339 

u 

IKii 

■ 

IKii 

20245 

E 


The lower right hand comer has the entries 


2 a 
Ev E^ 
E* Tuf 


Lv I 
Ly\ 
2/-*, 


where E X V, E V, and E * are obtained by adding 
the cumulations in the columns or rows involved. 


The values C(y v ) are easily computed from columns (2) and (3). The values 
of G{x v ) are computed by forming the cumulated product of the row frequency 
and & The values are recorded when the products contributed by a given row 
have been computed. The values C{y B ) and C(x x ) are obtained similarly. 

The value of r is easily obtained from the lower right hand entries. The value 
A tlV = N2xy - (&e)(2 y) is obtained from diagonal entries, A xa = N2x* — (S&) 2 







































































COMPUTATION OP MOMENTS WITH COMULATtVE TOTALS 


303 


is obtained from entries in the last row, A VtV = NLy* - (lyf ia obtained from 

A x , y 


the last column, and r = 


' is easily computed. In the above problem 


V A Xl9 Ay, V 

r = .441. 

The values M x , M y , <r # , <r tf are also easily obtained from, the lower right hand 
entries. The successive steps are indicated by the form 



'Lx 

22/ 



M v 

2 y 

2 xy 





Lx 

2» 8 

N 

■4*,* 






Ax,v 

>/A.y lV 

ffy 




VA.,z 

V'da.ap Ay fV 


M, 



<Tx 


r 


Recent methods of applying cumulative totals theory to correlation are found in 
references [91, [14], [17], [18], [19]. 

The third order moments are obtained by multiplying the entries of C{x v ) f 
<%), Cfc*), C(y x ) by 1,3, 5, etc. as indicated by (40). Thus Lx% - 4399 + 3 
(4339)4* etc. = 102, 103; Lx s yf %v = 63121; Lxy% = 46047; 2 yf v = 38,633. 
It iB hence possible to compute the skewness of each marginal distribution from 
Table VII. See also [18, p. 657]. 

12. Conclusion. This paper presents an outline of the computation of 
moments with the use of cumulative totals and columnar multipliers. Basic 
general theorems are derived and applications are made to one variable and two 
variable distributions both with and without punched card equipment. The 
formulas assume that the distance between successive variates (or class marks) 
is unity, but the reader should have no trouble in adapting the formulas to more 
general problems. 

In the interest of brevity the development is limited to the descending cumu¬ 
lations, It is possible to parallel the development here by deriving formulas in 
terms of ascending cumulations, It is also possible to work out formulas show¬ 
ing relations between columnar, row, and diagonal multipliers. There are 

other applications such as to the evaluation of E **> which are of interest. It is 

possible also that applications may be found for the general theory of sections 
2 and 3 which do not demand that v x be a power function. 


The Universitt of Michigan , 

















304 


PAUL 8. DWYER 


REFERENCES 

111 G. E. Lifts, “Die Theorie der Kolleotivgegenstiinde/’ PMlosopbxsche Studien (Wundt 
Editor), Vol. 17, (1901) pp. 407-578. 

[2] G. F. Hardy, Theory of the Construction of Tables of Mortality, pp. 59-62 end 124-128. 

[3] W. P. Eldbbton, Frequency Curves and Correlation, pp. 19-23. 

[4] W. F. Sheppard, “Factorial Moments in Terms of Sums or Differences," Pm. of 

London Math. Society, 2, Vol. 13, (1913) pp, 81-90. 

Also, "Fitting Polynomials by the Method of Least Squares," ibid, pp, 97-108. 

[6] J. Stiffbnsbn, “Factorial Moments and Discontinuous Frequency Functions," 

Skandmwk Aklvmelidsbifl, 6, (1923) pp, 73-89, 

[0) J. Stbffbnsen, Interpolation, pp, 93-104. 

[7] E, T. Whittaker and G. Robinson, Calculus of Observations, pp. 191-194. 

[Si R. FriBcli, "Sur le calcul num§rique des moments ordintures et des moments composes 
d’une distribution statistique," Shandimvisk Aktuarielidskrift , Vol. 10, (1927) 
pp. 81-91. 

[9] R. M. Mendenhall and R, Warren, “The Mendenhall-Warren-Hollerith Correlation 

Method," Columbia University Statistical Bureau Document No, 1.1929, Columbia 
University, New York, 48 pp, 

[10] R, M. Mendenhall and R. Warren, "Computing Statistical Coefficients from 

Punched Cards,” four, of Ed, ?ey., Vol. 21, (1930) pp, 63-62, 

[111 C. Jobdan, "Approximation and Graduation According to the Principle of Least 
Squares by Orthogonal Polynomials," Annals of Math, Slat., 3, (1932) pp. 
257-868, 

[12] A. C. Aitkin, "On the Graduation of Data by the Orthogonal Polynomials of Least 

Squares," Proc. of Roy, Soc. of Edin., Vol. 53, pp. 54-78. 

[13] A, C. Aitkin, “On Fitting Polynomials to Weighted Data by Least Squares," Proc, 

of Roy, Soc. of Edin „ Vol, 64, (1933-34) pp. 1-11. 

[141 Chen-nan Li, "Summation Method of Fitting Parabolic Curves and Calculating 
Linear ond Curvilinear Coefficients on a Scatter Diagram," Jonr, of Am. Slat. 
Assn., 29, (1934) pp, 405-409. 

[15] M. Sasdlt, Trend Analysis of Statistics, Chap, V1H. Also page 5. 

[16] H. C. Carver, “Uses of the Automatio Multiplying Punch"; Punched Card Method in 

Colleges and Universities, pp. 417-422. 

[17] A. E, Brandt, "Uses of the Progressive Digit Method”; Punched Card Method in 

Colleges and Universities, pp. 423-436. 

[18] P, S. Dwyer and A. D, Meacham, “The Preparation of Correlation Tables on a Tabu¬ 

lator Equipped with Digit Selection," Jour. Am. Stal, Assn., Vol, 32, (1937) 
pp. 654-682. 

[19] W, N. Duhost and H. M. Walker, Durosl-Walker Correlation Chart World Book Co, 

N. Y, (1938). 

[20] H. L, Rietz, Mathematical Statistics, 1927. 



A NOTE ON THE DERIVATION OF FORMULAE FOR MULTIPLE 
AND PARTIAL CORRELATION* 

By Louis Guttman 

1. Multiple Correlation. Lot, the measurements of N individuals on each 
of the n variables % x* ,•••,*», be expressed as relative deviates; 

that is, such that 

2 ** = 0 , Zxl-N, k = 1 , 2 , 3 ,... ,n, 

where the summations extend over the N individuals. 

If values of h are determined so that 

2(;ci - - • • ■ - \ n x a )~ is a minimum, 

and if we let 

(1) Vt = XjTs + Xs®» + • • ■ + X n s*, 

then tin; multiple correlation coefficient, obtained from the regression of on 
the remaining n — 1 variables, is defined as 

D *83*1 * • •« “ ^*1^1 • 

The square of the standard error of estimate of Xi on the remaining n - 1 vari¬ 
ables is defined as 

Vl.830 '11 = - Xj)‘. 

The minimizing values for h are obtained from the normal equations 

(2) 2(£i - Xi)x k = 0, h m 2,3,... ,n. 

which may be written in expanded notation as, 

Xi; + J’fflXa + Tu\i + • • ■ + n„X (l - )'i2 

rjjXj H* Xj + J’siXi + ••• + )'3nX,i ~ Tn 

9 •11*11 * t t I t I I M < n t 1 f M * t M • I 

7*3X2 + Vnzh + v n ihi + • • * + X « 55 f\» 
where r ik » ~ 2a* at* - n h Tu « h 

* The notions involved in this demonstration are certainly well-known. However, 
the directness and simplicity of the derivations may lend some merit to their exhibition. 
The writer is indebted to Professor Dunham Jackson for useful advice. 




306 


LOUIS GUTMAN 


From Cramer’s rule it is seea that 


' Xn> = ■—rj~ j if lc 11 Bu 0, 

All 

■where R f * is the cofaotor of (or of n?) iu the symmetric determinant 


R = 


hi 


r u ’ ■■ 

Tin 

h l 

Tn 

fas ••• 

Tin 



* * * * * Tik ' 


Tni 

Tn 2 

Tn 3 • • • 

Tn n 


Summing both sides of (1) over the N individuals shows that 2Xi = 0, so 
that the variance of X t is 

«■*» = 

From (2), the residual — Xi) is orthogonal to each of the a?* except x L ; 
therefore the residual is orthogonal to any linear combination of these %k and 
in particular to X\ ; that is, 

( 3 ) 
or 


2(*i - X l )X l = 0, 


= CTjCv 


and therefore 
( 4 ) 


= OTjr,. 

Multiplying both sides of (1) by ^ and summing over the individuals, we get: 

viifiiXi = ruX» + ruXa + * • • + PinX,, 

— (r a Ru + ruRis + • ■ • + n»Ri„) 


From (4) then, 


j i B 

tin 


It is clear that in general 




MULTIPLE AND PARTIAL CORRELATION 


307 


To find the standard error of estimate, expand 

~ — 1 — + er|, 

= — 

Ru 

In general, when <r* = 1, 

(5) *» M- . 

ttkk 

2. Partial Correlation. If values of h and n are determined so that 
2(®i — ji8®» — tux « — • ■ • — ^x„) 2 is a minimum 
and S(*i — y»x» — v&t — • ■ ■ — y n x n f is a minimum, 

and if we let 


, . Yl = /i]Xs + H 4*4 + * • ■ + Mn*n 

( 6 ) 

Yi = yjXj + VOi + ■ • • + VnXn , 

then the partial correlation coefficient between Xi and Sj, holding the remaining 
n — 2 variables constant, is defined as 


, ru.i4...n = >Vi-r,)C*,-r,); 

and since 2(xt — Yk) — 0, 


(7) 


-Kx.-Y^-Yd 

fia.a4...n = - 


Each fife is the negative of the ratio of the cofactor of ru to the cofactor of ru 
in the determinant obtained by striking out the second row and the second 
column from J2. We shall use the notation Rhi-ft to mean the algebraic com¬ 
plement of the second order minor in R } whose complement is obtained by 
striking out row h and column i and then row j and column k. Then 




Rn-\k 

# 22-11 


By argument similar to that used in (3), 

2(xi - 7 i)F 2 « 0, 


2^7* - 27,7*. 


or 




308 


LOUIS OUTTMAN 


Similarly, 




Then the numerator of the right member of (7) becomes, after expanding and 
collecting terms, 


( 8 ) 


r« - fr/jiVj. 


Multiplying both sides of (6) by ^ and summing over the N individuals, we have, 


VrjtiYi - TiiHi + Tulii + ••• + TinUn 
1 


(9) 


(%$22.13 + »'2( $22-14 + 11 * + TlJiiUu) 


* ru + 


Analogous to (5), we have, 


( 10 ). 


flii 

$28-11 


$2! 


2 °U 

<71.34•«*n = ) ^2,34*«*n 52 D • 

' ita-n mm 


From (8), (9), and (10) the right member of (7) becomes 

“$12 


Vftft 


It is seen that in general 


-ft 

^ ll\18* • • • ».A—1,^+lf-• •« = " 


University op Minnesota, 



NOTE ON REGRESSION FUNCTIONS IN THE CASE OF THREE 
SECOND ORDER RANDOM VARIABLES 

By Clyde A. Bridger 

The study of the correlation of two second-order random variables has re¬ 
ceived the attention of several authors, among them Yule [1], Charlier [2], 
Wicksell [3, 4], and Tschuprow [5]. Yule writes of them under the guise of 
“attributes.” The study of three or more second order random variables has 
lagged behind. In this note we shall examine the regression function of one 
second order random variable on two others by considering the problem from 
the point of view of Tschuprow’s [6] paper on the correlation of three random 
variables. 

A variable X that takes on m values xi, ■ , x m with corresponding prob¬ 
abilities pi, • • • , p m subject to the condition £ p< = 1 is defined as a random 

i 

variable of order m, (In particular, if X takes on only two values, x and s' 
with probabilities p and q, where p + g = 1, X is a random variable of second 
order.) The system of values x and probabilities p constitute the law of distri¬ 
bution of X. In the case of two random variables, X and 7, there exists a 
joint distribution law, covering all possible combinations of X and 7, together 
with their associated probabilities Pu, • • ■ , p m » the joint distribution law con¬ 
tains all of the information regarding the stochastical dependence of X and 7. 

The extension' to more than two variables is immediate. Let represent 
the probability of the simultaneous occurrence of the set of values Xi , yj , Zk 
of three random variables X, 7, and 7; pu , that of the simultaneous occurrence 
of X{, yj together without reference to 7; pi, that of the occurrence of a:,- without 
reference to 7 or Z ;,etc. Then, we have relationships of the types 22 2 Put 

i j k 

= 2 2 Pa = 2 Vi = i; E pu* ~ pa; 2 E p<a = E m - Ep<*■ 

t i i i if i % 

Similarly, let pj* be the probability of the simultaneous occurrence of y,- and 
z k on the condition that X takes on the value Xi ; p?, that of the occurrence of 
l/i without reference to Z, on the same condition; etc. Then 

E^-Epr-EEpS’-i; 'EpS'-p! 9 ; jwP.-jw • 

k k , 1 h i 

Pijpi s) = PiPih ~ ViPi a Pk m « pat; , 2 Pip? = Pi: etc. » 

Denoting by E{x) or simply Ex the expression “the mean value or mathe¬ 
matical expectation of x,” we have % = EX s Y s Z k = 22 2 piikxly’zk •, 

In particular, the mean values of the distributions are given by Wx =r EX 

309 



310 


CI/5TDE A. BR1BQEH 


* Y\ PiXi, my — EY ** 5^ p#/, 5:5 UZ = 2 Then we may write 

“ i ^ 

fifth ” S(X — WZ-x/(T — — 17lz) = Fu U) = 2 2D Pw*(Xj — 

(y* - Wy)°(«b - w*z) A - The quantities p may be identified as terms in the 
expression for the moments for the sum of three variables as follows: E(u + t> 
+ w) n = flu” + nEu^v + • • • 4* hEvfv 8 w K + • • * + i?w n , where / + p + ft, 
« n. If n = 2, we have the variance of the sum of three variables given by 
Ma>. + 2/m. + M* 9 < + 2pi.ii + 2pi. x + p..>, where the dots in the subscripts indi¬ 
cate variables not considered, Thus pj.. refers to the second moment of the 
distribution of the variable X about its mean, m x , without consideration of the 
distributions of Y or Z. If every term of the expansion of the n-th moment of 
the sum of three variables is divided by the quantity Vpa./p.ai p p,.i* ) the expan- 
sion takes the “normal form.” The type term is r/ ff * — p/*a 
I n. the case of one variable, r f ~ M//V fiJ* so n = 0, r t » 1, r» = U - fo, 
etc. In the cose of two variables, r x , » r, x « 0, f*. * r.* » 1, r u = Pearson’s 
product-moment coefficient of correlation, etc. Functions of parameters r will 
serve to characterize the law of correlation among the variables. 

By wilting the expressions with superscript (i) to denote that the values of 
the distributions of Y and Z are those which correspond to a fixed value %i 
of the distribution of X, we have m? x) ■* (iSF)* 0 , wte co m ( EZ) (t \ p 0 h (l) - E{Y 
- w r (fl ) times (Z - m z {i) ) t r g ™ = (For * h » 1, 

becomes the conditional coefficient of correlation between Y and Z for X * s<.) 
Thus it follows that we can study the correlation between Y and Z for each 
value of X separately. 

For second order random variables, some changes in notation can be made. 
Let p# and p S ' be the probabilities corresponding to the values x and s', respec¬ 
tively, of X; p v and p y < correspond to y and y l } respectively; p, and p*> correspond 
to z and z r f respectively. Also, let p*y represent the probability of the simul¬ 
taneous occurrence of x and y together without reference to the distribution 
of Z, etc,, and p av , represent the probability of the simultaneous occurrence of 
all three values, x } y } z, of their respective distributions, etc. Then, p* + p,f 
= Pv + JV * p* + Pm• =» 1; P*y + Psy* — P* ; Pey$ + P«y,» + fW* + P *y'»' 
= p#; etc. 

Let us set up a system of normal coordinates in which the values Ui along 

the [/-axis are defined by Ui = those along the Y-axis by Vj » ~ 

VM*.. VP-** 


and those along the F-axis by Fa = - 7— Let m 2 (>V) represent the mean 

V M-9 

of the set of values of the Z distribution whioh correspond to the fixed pair of 
values, (a<, y*), of the X and Y distributions. Then, in the new coordinate 

vn ^ m 

system, the same thing is given by Mw (ii) « — - ~ - z . Now, the series of 

*y 

values obtained by giving £ and j different values for the pair (Ui, V } ) 
determine what is called the regression function of F on U and Y (or, in the 



DEGRESSION FUNCTIONS FOR SECOND ORDER RANDOM VARIABLES 311 


original notation, the surface of regression of the distribution of Z on the distri¬ 
butions of X and Y). Similarly, the values of ob taine d 

. Vf^» 

by fixing U and varying V in the set ([/<, Vj) determine what is called the 
conditional line of regression of W on V for a fixed value of U. With these 
definitions we shall consider the problem of finding a regression function of W 
on U and V for three second order random variables. 

For convenience, write S xt = p xv - p x p v , S„ = p x . - p,p,, = p v , - p„p„ 

a V = PxPxyt ' Pxti Pxt I ** = Vxyt VxvVn fin ~ Px'Px'yi ~ Px'i/Px’t) 8xw = 

Pv^xt ~ pxfiyt ® if “ Px&vx~ P*S*v = c* Pt/Sxt — PxSxji- Direct substi¬ 
tutions into the several formulas developed above then gives us the represent¬ 
ative forms to be used in subsequent calculations: 

x - m x = p X '(x — x'),- x 1 - m x - -Px(x - x'). 

m x . -- p,x + p x ‘x', n.. = 0, r s .. = 1, r s .. = v *'~ Pj , 

VPxPs' 

1 n ijy 

U.. --3, r u . = _= , r sl . = r u .n.., 

Pi IV VPxPx'PyPv 

r a . = r u .r.s., r„. = r u .r.<., r 22 . = r u .r»..r.». + 1, 

ni.r»i. = r u .(r,8. - 1), r 2U = r,..r lu + r. u , 

r m = r.». r m + fj.j, r ua = r. .,rm + r u ., 


rm = 


VPxPx'PvPv’P.P, 


, Vx- 


VPxP, 


Ul * 


Hfr 010 = 


Pw Vp»p, 


, Jtfr' U0 = 


VP*P #' 9 


~P* 

Vv*p/ 


^y# "" nr («•) 

V = -7==j jMlF = - y-= ~ > 

Pi'v V Pi Pi' P*V Vftfr 

[ Mf 110 ]* 1 ' 0 -^=, --| g =, 

pxi/ VPmP«' P«'» VP.'*P«v 

tw u °] w " ) = - - - ~^g-— 

PiV' V Pxt pxt' P*v V P#'»P»'i' 

In the case of correlation of two second order random variables, a linear 
regression function can always be found [3, 5]. Similarly, the conditional 
regression functions in the case of three second order random variables can 
always be taken as linear. If we take as the form of the regression function 
of W on U and V the form = aU{ + bVj + eUiV f -H d, where a, b, c,i 
are constants to be determined by direct substitution for Ut and V from the 
distributions of X and F, it is seen that linearity of all total and conditional 



312 


CLYDE A. BRIDGER 


regression functions is preserved. By total regression function, we mean the 
regression of W on V or W on V. 

Now consider the problem of finding a, 6, c, d. The direct substitution pro¬ 
vides us with four linearly dependent equations in four unknowns. Linear 
combinations reduce the set to three, from which the relationship d => -cr u . 
is obtained. By building up the various terms in the equations through dividing 
by the necessary values of p, the parameters r can be made to appear. Further 
combinations now reduce the set to the following three: 

rm = or»i. -f bru. + c(r S2 . - r,,.) 

r.u = on,. + b + cr«. 

n-i =* a + br„, -f- crjj. 

The solution gives 


d — 


b — 
c = 


n.l — ^21* “ n tf n 

--- - - " . -q- v — U U> C 

1 — 1 

r.u “ r n :r L ,i r«, - r u ;r«. ^ 

“T=*?- izr^~ c = h “ b 0 

(1 - r u . 2 )(r m — a'r 12 . — b'r 31 .) 4- A, where 


1 r u . fn. 
r u . 1 ru. 

?% i- r w . r 32 . — rn.* 


The regression function becomes 

M w {ij) - a'tt + bT/ - c(r m + a"[/* + 6"7/ - 17*7/). If c = 0 the surface 
is a plane. Examination of the characteristics of r m shows that generally c can¬ 
not be zero. The vanishing of c implies that special relations must exist between 
Pi/* and pa , , p#. \ 

Two constants of considerable importance in the theory of correlation are the 
multiple correlation coefficient and the multiple correlation ratio. For the 
regression of W on U and 7, the former is defined as B u * = a'rw + bV.u and the 
latter as ij —.2 =22 For planar regression, the difference 

— Ru, 2 must vanish, For others, the difference takes on values characteristic 
of the regression function. To find the value it takes for our case, we set up 
the value of 77—2 from the regression function just given and subtract J 2 u. 2 . 

By direct substitution, we have - Bn . 2 = 22 Vui&Ui + 67/ - cUiV,- 

-Mu) 3 - a'r hl -b'r.n . Since 2 2 TuU* - 1, fe 2 P<i(U<V ,-) s - to., 

l j , i i 

etc,, we find rather easily that 



REGRESSION FUNCTIONS FOR SECOND ORDER RANDOM VARIABLES 313 


We can also obtain the same value of - -Rn. 5 by direct substitution for 
the four values of M* iS * in ^_ 2 and subtracting R n 2 . To actually obtain this 
is a long laborious process complicated by the fact that so many alternate forms 
for the answer are possible, of which only one is comparable with the value 
previously found, The general procedure iB first to set up from the definition 
the expression K = 7/_2 - 



Then we build up each square by addition and subtraction so that it will con¬ 
tain a d m term. At the close of the process, we convert the whole expression 
into the parameters r by dividing through by VtPt’iVtpt'VvVv'f &nd substituting 
from the list of representative forms given at the beginning of the paper, A 
matter of rearrangement now gives the same result as before. 

From the symmetry involved, we can say that, in the case of the correlation 
of three second order random variables, the function representing the regression 
of one on the other two has an equation in normal coordinates of the form 
Afr^ * a!7i + 67/ - cUiVj - cr m , where a, 6, and c satisfy equations of 
type 

rm = 0 * 21 . + fris. + c(r a . - r u . 2 ) 


r,n - aru. + b + cr«. 


I'm m a + bv 'ii, + cr 2 i. 

State Division or Public Health, 

Boise, Idaho. , 


BIBLIOGRAPHY 

[1] G. Udny Yule.- An Introduction to the Theory of Statistics, London: Charles Griffin & 

Go., Ltd., 1022. 6th Ed. 

[2] C. V. L. Charlieb, M 0m korrelation mellan eganskaper inom den homograda statis- 

tiken,” SocnskaAhtuarnffreningens Tid&krift. Vol. I (1914),pp.21-35. 

[3] S, D. Wicksell. ,f Some theorems in the theory of probability, with special reference 

to their importance in the theory of homograde correlation” Svenska Ahlua- . 
rief&reningens TidskrifU Yol. Ill (1916), pp. 165-213.. 

[4] S. D. Wicksell, ff 0n the correlation of acting probabilities.” Skandinamh Aktua- 

rietidskrifl . Yol. I (1918), pp, 98-135, 

[5] A. A. Tschuprow. Qrundbegrijfe undQrundproblem der Komlationsiheorfa Leipzig: 

B. G. Teubner, 1926, 

[6] A. A, Tschuphow, (Translation into English by L, Isserlis,) n The Mathematical 

Theory of the Statistical Methods Employed in the Study of Correlation in the 
Case of Three Variables.” Transactions of the Cambridge Philosophical Sodsiy. 
Vol. XXIII, no. 12 (1928), pp. 337-352, 




mmrnmmm 

iHlplljltiiS 

• . 

: : : :: 








WSfflf;$1 j| 

: ; ' . : ’ i' ; \' \-.. '.; .•,:^>,r^-' ■ 

' 

■ ■ ■.. . -v .* • ■ 

■:‘; : :il§y|?»v 







